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ABSTRACT 


The  purpose  of  this  study  was  test  the  influence  of  phase  on  the 
quality  of  speech  reproduced  by  a  speaker  dependent  compression  system.  The 
tests  consisted  of  compressing  frequency  domain  speech  vectors  using  the 
Karhunen-Loeve  Transform,  with  and  without  phase,  then  making  subjective 
judgements  as  to  the  reproduced  quality.  Error  Metrics  were  then  tested  for 
their  suitability  as  predictors  of  reproduced  quality. 

The  compression  software  transformed  each  speech  vector  into  a 
vector  of  complex  Fourier  coefficients  (only  half  of  the  coefficients  are 
needed  as  transform  is  hermitian) .  Phase  was  preserved  by  using  the  real 
frequency  components  to  form  one  vector  and  the  corresponding  imaginary 
components  to  form  a  second  vector  of  real  numbers  which  were  then  separately 
compressed.  The  expanded  vectors  were  recombined  and  speech  reconstructed  by 

Inverse  Fourier  Transformation. 

Compression  ratios  of  8:1  could  be  achieved  without  any 
perceivable  difference  between  the  original  speech  and  reconstructed  speech  by 
minimizing  the  MSE  of  each  vector  of  the  pair.  The  8:1  Compression  Ratio 
corresponded  to  a  covariance  matrix  Condition  Number  of  200. 

Recommendations  for  further  study  into  voice  characterization  and 
an  optimal  transform  for  speech  are  made. 
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CHAPTER  ONE 


INTRODUCTION 


Speech  Compression 


There  are  a  nujLber  of  signal  processing  techniques  applied  to 
speech  signals  which  are  associated  with  the  term  'compression'.  One  common 
compression  technique  is  to  use  a  non— linear  quantization  on  analog  signals  so 
that  better  resolution  of  low  amplitudes  can  be  achieved^.  Another 
compression  technique  is  to  reduce  the  bandwidth  of  speech  so  that  less 
channel  bandwidth  is  needed  to  transmit  the  intelligence*.  In  both  cases 
the  signals  are  compressed,  the  first  in  amplitude  and  the  second  in 
frequency.  This  thesis  is  concerned  with  the  frequency  compression  of  speech 
and  all  further  references  to  speech  compression  should  be  understood  as  such, 

The  reasons  for  speech  compression  are  many.  Speech  signals  that 
have  been  pre-processed  to  reduce  their  bandwidth  are  simpler  signals  as  the 
redundant  material  has  been  removed.  One  advantage  of  this  simplification  is 
that  applications  such  as  Automatic  Speech  Recognition  (ASR)  can  be 
implemented  using  a  system  that  is  less  elaborate  than  that  required  if  the 
compression  step  was  neglected*. 

Ccmmunications  is  the  application  which  has  motivated  mo.st  speech 
compression  research*.  Compressed  speech  requires  less  bandwidth  than  full 
bandwidth  speech  and  so  increases  the  throughput  of  a  ccmmunlcation  networks 
by  allowing  more  channels  to  be  allocated  to  a  frequency  band.  The  military 
uses  speech  compression  to  increase  the  throughput  of  its  communication 
systems  (in  particular  those  networks  connecting  mobile  nodes)  and  as  a  means 
of  implementing  narrow  band  secure  voice  systems [4]. 
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Primary  Issues 

Regardless  of  the  compression  ratio  (compression  ratio  is  ratio  of 
the  original  bandwidth  to  the  bandwidth  of  the  compressed  speech) ,  the 
performance  of  a  compression  system  is  ultimately  decided  by  the 
intelligibility  and  quality  of  the  reconstructed  speech  (note  that  this  is  not 
to  say  that  compression  ratio  is  unimportant) . 

Intelligibility 

Intelligibility  is  a  term  used  to  describe  how  well  the  meaning  of 
communication  (intelligibility)  is  passed.  It  is  perhaps  most  easily 
explained  .n  terms  of  the  talker  (originator)  asking  the  listener  to  repeat 
back  the  message.  If  the  original  message  matches  that  sent  back  then  there 
was  high  intelligibility.  Intelligibility  does  not  suggest  the  need  to 
recognize  the  talker  by  their  voice,  only  what  was  said. 

Quality 

Initially,  let  the  quality  of  speech  be  defined  as  the  degree  to 
which  a  listener  recognizes  the  talker's  voice.  The  definition  includes  the 
ability  to  understand  the  intelligence  of  the  message  and  the  ability  to  at 
least  hear,  if  not  recognize,  the  sounds  which  characterize  the  talker. 
Therefore,  the  intelligibility  aspects  of  speech  are  considered  a  subset  of 
the  quality  aspects  of  speech.  This  definition  is  somewhat  naive  in  that  the 
listener  may  not  know  the  talker  and  so  could  not  be  expected  to  recognize 
them  by  their  voice  over  a  communications  channel.  A  more  objective 
definition  is  that  of  toll  quality  which  is  the  level  of  quality  required  to 
make  the  reconstructed  speech  indistinguishable  from  the  bandlimited 
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speech[4] . 

Metrics 

Many  authors  provide  definitions  of  intelligibility  and  quality. 
The  above  definitions  are  non-specific  in  that  there  are  not  any  objective 
measurements  that  can  be  applied.  Intelligibility  could  be  measured  by  using 
some  sort  of  word  error  rate  metric  but,  because  of  human  intuition,  whole 
words  could  be  misunderstood  and  yet  the  intelligence  still  conveyed 
accurately.  However,  a  word  error  rate  metric  would  be  an  objective  measure 
of  intelligibility.  The  definition  for  toll  quality  is  succinct  and  easily 
understood;  however,  no  indication  is  given  to  "the  level  of  quality"  or  how 
this  level  is  measured.  It  is  then  important  to  decide  some  metric  by  which 
compression  ratios  and  overall  system  performance  can  be  measured  so  that 
application  constraints,  such  as  available  channel  bandwidth  and  the  need  to 
understand  the  communication,  can  be  used  for  design  criteria. 

Modelling 

Perhaps  the  most  important  issue  is  that  of  modelling  human  speech 
and  hearing.  A  variety  of  speech  compression  models  exist,  the  most  popular 
of  which  is  the  speaker  independent  Linear  Predictive  Coding  (LPC)  model[4]. 
Modelling,  in  general,  will  be  further  discussed  in  the  following  section  on 
speech  characterization. 

Speech  Characterization 

Modelling  is  a  standard  and  valid  scientific  approach  to 
characterizing  any  process.  A  parametric  speech  model  is  most  desirable  as 
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varying  the  parameters  produces  different  sounds  and  so  speech,  consisting  of 
a  range  of  sounds,  can  be  represented  by  different  sets  of  model  parameters. 
If  the  parameters  of  the  model  can  be  represented  by  a  lower  number  of 
quantities  than  the  speech  sample  being  modelled,  then  compression  has  been 
achieved. 

It  is  typical  for  the  model  to  only  approximate  the  original 
speech  which  leads  to  a  reconstruction  error.  Expectations  are  that  the 
lower  the  reconstruction  error,  the  better  the  quality  of  the  reconstruction 
(assuming  a  suitable  metric  is  used).  However,  speech  is  a  highly  redundant 
signal  for  conveying  intelligence  and  large  reconstruction  errors  can  be 
tolerated  without  reducing  intelligibility.  If  intelligibility  is  the  most 
important  performance  criteria,  then  a  speaker  independent  compression  model 
such  as  LPC  would  be  useful. 

The  LPC  model,  or  its  variants,  is  used  effectively  in  speech 
compression  applications  as  it  provides  intelligible,  speaker  independent, 
reproduction  of  compressed  speech.  A  typical  compression  ratio  is  that  of 
the  STU-III  which  compresses  64  Kbps  to  2400  bps[4].  The  major  problem  with 
LPC  based  systems  is  that  of  noise[4]  (noisy  signals  corrupt  LPC  model). 
Further,  the  quality  of  the  reproduction  is  reported  as  poor  because  speakers 
cannot  be  ldentlfied[4] . 

When  the  quality  of  reproduced  speech  is  discussed,  most  often  the 
results  of  communications  applications  are  presented  (as  in  the  example 
above).  But  speech  compression  is  also  used  in  speech  recognition 
applications  which  generally  do  not  require  the  quality  aspects  of  speech  to 
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be  preserved.  Consequently,  the  speaker  independence  of  the  LPC  model  makes 
it  useful  for  speaker  independent  speech  recognition  applications  (an  LPC 
model  is  used  to  compress  utterances  to  the  vocal  tract  model  thereby  reducing 
the  dimensionality  of  the  feature  space  to  the  number  of  model  parameters). 
However,  if  speaker  verifier clon[(>]  is  viewed  as  a  recognition  task  with  the 
additional  requirement  of  preserving  the  quality  aspects  that  characterize  a 
voice,  then  the  LPC  compression  model  is  not  satisfactory.  Because  of  this 
need  to  preserve  the  quality  of  speech,  a  compression  model  which  supports 
speaker  verification  should  also  be  applicable  to  communication  applications. 

One  of  the  principles  of  recognition  is  to  learn  a  speaker's  voice 
patterns  using  labelled  data  (l.e.  get  a  speaker  to  say  pre-determined  words) 
and  then  recognize  subsequent  voice  patterns.  Studying  voice  patterns  may 
provide  some  insight  into  the  compression  model  capable  of  quality  speech 

reproduction.  i 

! 

j 

Proposal  \ 

It  is  believed  that  the  quality  of  a  speech  compression  system 
such  as  LPC  can  be  improved  if  the  speaker  independence  is  traded  for  speaker 
dependence.  Therefore,  a  speaktr-dependent  frequency  domain  speech 
compression  system  is  proposed.  Users  will  first  train  the  system  by 
generating  a  training  set  from  which  the  principal  components  of  the  speech 
will  be  derived.  Subsequent  speech  inputs  will  be  mapped  into  the  compressed 
space  spanned  by  the  principal  components  and  transmitted.  At  the  receiver, 
the  compressed  space  is  mapped  back  to  the  dimensions  of  the  original  space 
and  the  speech  reproduced. 
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The  training  set  will  consist  of  frequency  vectors  which  are  the 
Fourier  Transform  (FT)  of  discrete-time  samples  of  a  user's  voice.  The 
frequency  vectors  will  form  a  covariance  matrix  of  which  the  eigenvectors  will 
form  the  rows  of  a  Karhunen-Loeve  (KL)  Transform  matrix.  When  used,  the 
frequency  vectors  will  be  multiplied  by  the  KL  transform  matrix  to  produce  a 
set  of  KL  coefficients.  The  resulting  set  of  coefficients  will  be 
transmitted  through  a  narrow-band  channel. 

Prior  to  transmission,  the  KL  transform  matrix  will  be  transmitted 
to  the  receiver  where  it  will  be  Inverted  and  used  to  inverse  transform  the 
set  of  KL  coefficients  upon  reception.  Note,  that  the  KL  transform  matrix  is 
made  up  of  orthogonal  vectors;  consequently,  matrix  inversion  can  be  achieved 
by  transposing  the  matrix. 

Scope 


This  thesis  will  use  the  Karhunen-Loeve  transformation  for  speech 
compression,  examining;  the  effect  of  preserving  phase  and  error  metrics  for 
quantifying  reproduced  quality.  The  results  will  also  be  used  to  conclude 
the  effectiveness  of  characterizing  a  voice  by  a  set  of  phonemes. 

Assumptions 


Voice  Characterization  by  Phonemes 

The  first  of  the  assumptions  is  that  a  speaker  can  be  properly 
characterized  by  learning  the  range  of  sounds  that  he,  or  she,  makes.  The 
system  proposed  builds  a  model  of  the  user's  voice  from  a  set  of  phonemes. 


7 


The  phoneme  set  Is  represented  by  sentences  which  consist  of  all  possible 
phoneme  bigrams  (i.e.  40  phonemes  has  40^  bigrams).  Once  a  comprehensive 
sample  of  the  user's  voice  is  obtained,  the  goal  becomes  one  of  extracting  the 
information  that  describes  quality  (the  foundation  of  the  proposal). 

Suitability  of  The  Fourier  Transform 

The  second  assumption  is  that  the  linear  frequency  scale  of  the  FT 
will  be  suitable  to  represent  speech.  The  doubt  results  from  the  linear 
frequency  scale  of  the  FT  versus  the  non-linear  frequency  scale  of  human 
perception. 

Applicability  of  The  Karhunen~Loeve  Transform 

It  is  also  assumed  that  the  KL  transform  will  yield  a  compression 
model  that  concisely  represents  the  quality  aspects  of  speech.  This 
assumption  is  based  upon  the  optimality  criteria  of  the  KL  transform  which, 
essentially,  selects  the  most  important  frequencies  for  transmission  by 
minimizing  the  mean-square-error  (MSE)  between  the  original  and  reproduced 
frequency  vectors.  By  minimizing  MSE  it  is  expected  that  the  quality  aspects 
of  speech  will  be  preserved. 

Implementation 

The  proposed  method  of  Implementation  is  to  use  a  parallel 
structure  to  make  vector  multiplications.  An  ideal  structure  for  this  is  the 
Artificial  Neural  Network  (ANN)  which  has  the  input  frequency  vector  presented 
to  the  layer  of  input  nodes  and  the  eigenvectors  of  the  KL  transform  are  the 
weights  connecting  the  input  layer  to  the  hidden  layer. 


Quality  Levels 


Before  the  results  of  any  tests  can  be  compared,  some  sort  of 
measurement  standards  need  to  be  established.  Speech  quality  is  highly 
subjective  as  human  perception  is  Involved.  In  particular,  if  a  listener  is 
familiar  with  the  speaker's  voice  then  making  Impartial  decisions  on  quality 
is  difficult.  Consequently,  some  'quality  levels'  need  to  be  defined. 

Excellent  Quality.  Excellent  quality  is  achieved  when  the 
reproduction  is  indistinguishable  from  the  original. 

Good  Quality.  Good  quality  is  achieved  when  the  speaker  is 
recognizable  <i.e.  the  signal  may  be  noisier  than  the  original  but  without 
distorting  the  speech) . 

Intelligible.  Intelligible  reproduction  is  speech  that  can  be 
understood  without  the  listener  having  to  interpolate  confusing  or  badly 
distorted  words. 

Unintelligible .  Unintelligible  speech  is  defined  here  as  speech 
which  cannot  be  clearly  understood. 

Influence  of  Phase 

There  is  no  known  reason  for  assviming  that  phase  will  affect 
reproduction  quality.  Speech  recognizors  do  not  require  phase  information; 
however,  it  is  wrong  to  assume  phase  plays  no  role  in  quality.  There  are 
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many  non-speech  examples  of  the  importance  of  phase  information,  optics 
contains  many  and  it  is  certain  that  other  disciplines  do  also. 

Error  Metrics 

The  MSE  is  often  used  as  an  error  metric.  However,  if  the  phase 
spectrum  is  distorted  in  such  a  way  to  maintain  magnitude,  Chen  it  is  expected 
that  the  reproduced  quality  will  decrease  even  though  the  MSE  may  in  fact 
decrease.  What  is  needed  is  a  way  of  measuring  the  relationship  between  Che 
frequencies  of  a  speech  vector. 

Sunsnary 


Speech  is  compressed  to  reduce  the  bandwidth  requirements  of  voice 
communications  and  as  a  pre-processing  stage  to  speech  recognition. 
Intelligibility  is  the  criteria  driving  narrow-band  secure  voice 
communications  and  is  implied  in  Che  performance  of  speech  recognition 
systems.  However,  speaker  verification  applications  require  reduced 
bandwidth  while  maintaining  the  quality  aspects  of  speech.  | 

i 

.  1 

A  speaker-dependent  speech  compression  system  which  maintains  the 
quality  aspects  of  the  user's  voice  is  proposed.  The  system  will  model  the  \ 
user's  voice  by  'learning'  a  'training  set'  of  naturally  spoken  phonemes  then 
compress  subsequent  utterances  using  the  characterization  model.  If  used  for 
communications,  the  speech  will  be  reconstructed  from  the  compressed  data 
transmitted.  The  goal  is  to  achieve  toll  quality. 
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Tlie  scope  of  the  study  is  to  firstly  test  the  influence  of  phase 
on  quality  and  the  suitability  of  some  error  metrics  used  to  measure  quality. 
The  assumptions  concerning  the  characterization  of  a  voice  with  a  set  of 
phonemes  and  the  preservation  of  quality  with  the  KL  transform  will  be 
discussed. 

Subsequent  Chapters 

Chapter  two  contains  a  search  of  relevant  literature  and 
discussions  on  topics  raised.  The  actual  experiments  performed  are  detailed 
in  chapter  three.  Experimental  results  are  discussed  in  chapter  four  while 
chapter  five  contains  conclusions  and  recommendations  for  further  research. 
Software  developed  for  the  research  is  listed  in  the  appendices. 
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Uelnsteln's  review  of  millcary  speech  processing  applications [4] 


revealed  that  the  LPC  algorithm  Is  the  most  popular  of  all  compression 
techniques  used  for  communications.  Parson's [5]  points  out  the 
versatility  of  LPC  by  describing  the  techniques  use  as  a  pre-processing  stage 
for  speech  recognition  systems.  Because  LPC  Is  popular  It  Is  worth 
Investigating  how  It  works.  Because  the  method  has  deficiencies,  being 
intolerance  to  noise  and  poor  reproduction  quality[4],  understanding  the 
principles  of  LPC  might  provide  some  insight  into  the  broader  compression 
problems . 


Linear  Prediction  Coding 

i 

Linear  Predictive  Coding  is  an  application  of  adaptive  linear 
pradiction.  Strobach(6]  writes,  "The  linear  prediction  model  provides  a 

I 

parametric  description  of  an  observed  process."  He  continues  with  the 
development,  "The  idea  of  parametric  process  modelling  in  context  with  linear 
prediction  methods  is  based  on  the  assumption  that  a  signal  is  completely 
determined  in  terms  of  its  first-order  (mean)  and  second-order  (covariance) 
information  when  only  the  parameters  and  the  excitation  of  the  generating 
filter  (process  model)  are  known."  Essentially,  the  next  value  associated 
with  an  observation  can  be  predicted  if:  a  sequence  of  values  associated  with 
the  preceding  observations  are  known;  and,  if  the  statistics  of  the  process 
under  observation  are  stationary.  The  length  of  the  sequence  of  preceding 
values  is  proportional  to  the  accuracy  of  the  prediction  (l.e.  many  preceding 
values,  high  accuracy).  Using  this  technique,  a  set  of  previous  discrete- 
time  measurements  (set  of  frames  of  samples)  can  be  used  to  generate  a  speech 
model  that  predicts  the  current  measurement  (frame  of  samples)  and  the  current 
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measuremen*;  is  used  to  update  the  speech  model  for  the  subsequent  prediction. 

The  speech  model  is  derived  from  the  physiology  of  speech 
production.  The  vocal  tract  is  a  flexible  hollow  muscle  which  can  be 
deformed  by  the  speaker  to  produce  different  sounds [7].  Excitation  for 
the  vocal  tract  can  be  air  modulated  by  the  vibrating  vocal  chords  (voiced 
sounds)  or  air  passing  through  the  rigid  (non-vibrating)  vocal  chords 
(unvoiced) .  The  LPC  model  samples  the  utterance  over  a  period  of  20  -  50 
milliseconds  (assumes  statlonarlty)  and  models  the  vocal  tract  configuration 
by  cylinders  of  differing  cross-sectional-areas.  The  excitation  is  modelled 
as  either:  noise  for  unvoiced  sounds;  or,  impulses  at  the  same  pitch  as  the 
vibrating  vocal  chorda  (voiced  sounds) .  The  excitation  and  vocal  tract  model 
is  transmitted  to  a  listener  where  the  utterance  is  reconstructed  by  exciting 
the  model  vocal  tract. 

The  popularity  of  LPC  is  derived  from  its  reproduction  of 
intelligibility  which,  for  communications,  is  the  essential  performance 
criterion.  However,  for  speaker  verification  the  quality  aspects  of  speech 
need  to  be  carried  with  the  model  and  LPC  does  not  provide  this  feature[4]. 
khere  do  the  quality  aspects  of  speech  lie? 

Makhoul[8]  refers  to  the  quality  aspects  of  LPC  in  terms  of  the 
accuracy  of  reproducing  the  predictor  parameters,  "It  has  been  known  for  some 
time  that  the  quantization  of  the  predictor  parameters  themselves  is  quite 
inefficient  since  a  large  number  of  bits  is  required  to  retain  the  desired 


This  statement 


fidelity  in  the  reconstructed  signal  at  the  receiver [ 72 ] " 
appears  to  be  suggesting  that  a  quality  reproduction  is  possible  using  linear 
prediction,  but  the  overhead  involved  is  prohibitive. 


Makhoul[8]  also  defines  his  error  metric,  E,  in  terms  of  the  ratio 
of  original  and  reproduced  power  spectral  envelopes. 


E 


iliil]’  do. 

P(W) 


where  P  is  Che  original  spectral  envelope, 

?  is  the  reconstructed  spectral  envelope 


(1) 


Notice  that  there  is  no  way  of  Including  phase  errors  between  the  two 
spectrums  using  this  error  metric.  Does  the  phase  spectrum  matter? 
White[9]  writes,  "It  is  generally  believed  that  all  acoustic  information 
relevant  to  speech  recognition  is  represented  by  the  time  evolution  of  the 
power  spectrum,  with  the  phase  component  being  relatively  unimportant." 
White's  coniments  are  related  solely  to  speech  recognition  which  does  not 
require  reconstruction  and  so  phase  may  not  be  important.  However,  for 
applications  that  require  quality  reconstructions  phase  may  be  important. 
Perhaps  frequency  domain  coding  can  provide  more  information  regarding  the 
Importance  of  phase. 


Makhoul ' s  reference 


15 


Frequency  Domain  Compression 


A  frame  of  speech  samples  can  be  considered  a  vector  of  dimension 
n,  where  n  is  the  number  of  samples  in  the  frame.  The  discrete-time  samples 
are  real  and  so  the  Fourier  Transform  (FT)  is  Hermitian[10] .  The  FT  is 
completely  invertible  and,  by  Parseval's  theorem,  energy  is  conserved[ll] . 
Therefore,  the  frame  of  n  discrete-time  speech  samples  can  be  represented  by 
complex  coefficients  at  *1/2  frequencies.  The  question  to  be  answered  is  - 
Why  transform  to  the  frequency  domain? 


One  reason  is  that  the  DFT  (Discrete  Fourier  Transform)  produces  a 
set  of  measurements  describing  the  spectral  content  of  the  particular  frame  of 
speech.  These  measurements  are  often  viewed  as  a  set  of  cross-correlations 
between  the  discrete-time  speech  and  a  complex  frequency  set[12].  For 
some,  the  DFT  is  more  intuitive  than  the  discrete-time  representation  as  it  is 
a  set  of  measurements  of  'content'. 

Another  reason  for  investigating  frequency  domain  compression 
models  is  that  AFIT  has  done  some  research  in  this  area  recently! 13] . 


Previous  Work  at  AFIT 

The  most  recent  speech  compression  work  done  at  AFIT  was  completed 
in  December  1991  by  Captain  Shane  Swi'czer(13].  Captain  Switzer  investigated 
methods  for  better  coding  fricatives  and  plosives.  Parsons [5]  description  of 
fricatives  includes  the  use  of  the  characteristic  frequency  spectrum  which  is 
used  to  locate  constrictions  of  the  vocal  tract.  If  the  vocal  tract  is 
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completely  shut  off  for  a  short  time  then  quickly  released,  the  sharp  sound  is 
called  a  stop  or  plosive.  Captain  Switzer's  research  revealed  that  shorter 
sampling  windows  were  better  for  coding  fricatives  and  plosives.  This  result 
is  reasonable  as  the  statistics  of  such  signals  varies  quickly  and  short 
windows  should  be  better  for  maintaining  stationarity. 

A  point  of  interest  regarding  Captain  Switzer's  compression 
technique  was  his  arbitrary  selection  of  frequencies  for  transmission.  He 
tried  a  number  of  selection  methods,  one  where  he  simply  selected  the 
frequencies  of  greatest  magnitude.  The  result  of  this  selection  technique 
was  an  intelligible  reproduction  with  a  compression  ratio  of  64000/2400 
(equivalent  of  Che  SIU-III  using  Che  LPC  model[4]).  Can  reproduced  quality 
be  improved  by  a  better  selection  of  frequencies?  If  so,  how  are  these 
'better'  frequencies  selected. 

Pols  1971 

Louis  Pols [14]  applied  15  ms  speech  frames,  bandlimited  to  3 
KHz,  to  a  bank  of  20  1/3  octave  filters  for  spectral  analysis.  He  combined 
the  outputs  out  of  lowest  three  filters!,  and  the  next  lowest  two  filter 
outputs  were  similarly  configured,  to  produce  17  spectral  components.  The  17 

frequency  components  are  analysis  using  principal  Component  Analysts  (PCA) . 

1 

There  are  two  steps  to  PCA:  firstly,  a  1^  x  17  covariance  matrix  is  formed 

I 

from  the  15  ms  frames;  and,  then  the  covariance  matrix  is  analyzed  for  the 
directions  of  maximum  variance. 

Pols [14]  experimented  with  20  speakers  speaking  the  same  20  Dutch 
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words.  His  results  showed  that  the  variance  was  greatest  in  the  high 
frequency  regions  where  the  vowels,  nasals  and  fricatives  varied  the  most. 
Applying  PCA  he  was  able  to  reduce  the  17  dimensional  speech  frames  to  3 
dimensions.  Analysis  of  these  dimensions  showed  that  they  represent  78.1%  of 
the  energy  and  that  no  other  dimension  has  more  than  5.6%  of  the  energy. 
Dimension  one  (49.1%)  appeared  to  discriminate  between  sonorants  and  non- 
sonorants  since  it  separates  the  low  and  high  frequency  ends  of  the  spectrum 
(sonorants  are  voiced  sounds,  i.e  the  vocal  tract  excitation  is  modulated  by 
the  vibrating  vocal  chords).  Dimensions  two  and  three  had  their  highest 
weightings  in  the  spectral  regions  of  the  formant  frequencies. 

Speech  frames  were  then  mapped  into  the  principal  components  space 
and,  after  some  time  normalization,  compared  to  the  20  reference  words,  395 
out  of  400  words  were  correctly  classified. 

The  results  of  these  experiments  need  to  be  put  into  perspective. 
Firstly,  while  PCA  is  a  powerful  tool  for  analyzing  and  compressing  a  feature 
space,  and  Pols  has  shown  that  the  compressed  space  is  suitable  for  speech 
recognition,  the  same  speakers  and  words  were  used  for  both  training  and 
testing  (i.e.  the  actual  utterances  used  to  characterize  the  feature  space 
were  different  examples  of  the  same  utterances  used  for  testing  the  system) . 

Secondly,  Pols'  results  give  no  indication  of  which  speaker  spoke 
the  word  that  was  classified. 

Thirdly,  Pols  makes  an  interesting  comment  when  discussing  time 
normalization.  "One  possible  correct  time  normalization,  apart  from  a  non- 


18 


linear  approach [lA]*,  is  to  treat  the  individual  sounds  within  an  utterance 
separately,  taking  into  account  the  target  positions  (often  not  actually 
reached)  of  short  sounds  in  a  context."  This  comment  will  be  addressed  later 
in  the  chapter. 

Lastly,  the  type  of  spectral  analysis  used  is  interesting  in  that 
1/3  octave  filters  provided  a  constant  bandwidth  to  center-frequency  ratio 
(df/fo)  and  covered  the  band  to  8  KHz  without  any  phase  information.  This 
constant-Q  filter  arrangement  will  be  discussed  in  relation  to  the  suitability 
of  the  Fourier  Transform  later  in  the  chapter. 

Pols'  paper  shows  that  if  the  speech  to  be  recognized  is  correctly 
characterized,  then  the  frequency  domain  is  a  suitable  space  in  which  to 
perform  recognition.  Also,  some  of  the  energy  can  be  removed  (in  t.iis  case 
21.9%  is  unnecessary)  by  only  extracting  the  principal  components  of  a  speech 
sample . 


Analysis  of  the  training  set  to  find  the  directions  of  maximum 
variance  and  then  using  these  directions  of  most  significance  to  describe  a 
compressed  space  could  be  a  technique  for  the  finding  the  important 
frequencies  to  transmit.  The  Karhunen-Loeve  (KL)  Transform  can  also  be  used 
to  find  the  directions  of  maximum  variance. 


2 


Pols  reference 


Karhimen-Loeve  Transform 


When  working  with  discrete  data  the  KL  transform  is  best 
understood  from  the  transform  matrix  where  the  rows  are  the  eigenvectors  of 
the  covariance  matrix  of  the  training  set.  The  eigenvectors  selected  for  the 
transform  matrix  are  those  eigenvectors  associated  with  the  eigenvalues  of  the 
covariance  matrix  of  greatest  magnitude.  Subsequent  speech  vectors  are 
multiplied  by  the  transform  matrix  such  that  a  vector  of  KL  coefficients 
results ,  the  number  of  which  are  determined  by  the  number  of  rows  used  in  the 
KL  transform  matrix.  That  is: 

=  (x^-x)  (x^-x)  *■ 

where  JCjj  is  the  ith  input  column  vector  of  the  training  set  (2) 

X  is  the  vector  of  component  averages 
T  denotes  transpose 

where  k  is  the  kth  frequency  component  of  x,  and 

N  is  the  number  of  vectors  in  the  characterization  set. 

(3) 

Let  Xj  =  eig{SS2f) 

be  an  eigenvector  that  is  associated 
with  the  jth  eigenvalue  of  Cov. 


These  vectors  form  an  orthogonal  'eigen-space'  into  which  the  KL 


transform  maps  subsequent  speech  vectors.  The  inverse  transform  results  from 


JCLC  =  X  (x-x) 


where  KLC  is  a  vector  of  n  coefficients,  x  is  the 
KL  transform  matrix  with  n  rows,x  is  an 
input  vector,  and  x  is  the  vector  of  means 

(5) 

the  vector  of  KL  coefficients  being  multiplied  by  the  transpose  of  the  KL 
transform  matrix.  The  KL  transform  has  the  optimal  properties  of:  minimizing 
mean-square  error  when  a  reduced  number  of  coefficients  are  used;  and, 
minimizing  tha  entropy  function  defined  in  terms  of  the  average  squared 
coefficients  used[15]. 

Chen  and  Huo  1991 

A  recent  use  of  the  KL  transformation  for  speech  compression  was 
that  of  Chen  and  Huo[16].  In  this  study,  they  reported,  "...  up  to  70% 
data  reduction  is  possible  with  no  appreciable  degradation  of  the  signal."  A 
point  of  interest  is  the  way  they  represented  their  speech.  which  was  as 
vectors  of  Fourier-Bessel  (FB)  coefficients.  The  reason  for  using  the  FB 
Transform  was  based  upon  Rabiner  and  Schafer's[17]  linear  speech 
prediction  model  where  the  glottal  pulses  have  the  form  of  a  freely  decaying 
oscillation.  "The  Bessel  function  also  displays  amplitude-decaying  and 
nonuniform  zero  crossing  characteristics.  The  use  of  the  Bessel  function  as 
the  basis  function  for  speech  signal  decomposition  therefore  seems  logical  and 
natural. " [16] 

Chen  and  Huo [16]  performed  a  number  of  experiments  and  reported 
their  results  in  terms  of  reproduction  quality  and  compression  ratio.  Two  of 
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these  results  were:  for  150  Fourier-Bessel  (FB)  coefficients  excellent  quality 
was  achieved  with  10  KL  coefficients  (99.3%  of  the  energy);  and,  for  80  FB 
coefficients  good  quality  was  achieved  with  10  coefficients  (97.6%  of  the 
energy) . 

There  are  at  least  three  important  aspects  to  the  experiments  and 
results  of  Chen  and  Huo[16].  They  chose  the  FB  transform  as  they  felt  it  was 
better  model  for  speech  production.  Qualitative  terms  such  as  'excellent'  and 
'good'  were  used  without  any  specific  definitions  to  allow'  repeatability  (note 
that  there  is  no  doubt  that  they  achieved  excellent  and  good  quality) .  The 
KL  transform  was  success.'‘ully  used  for  speech  compression. 

The  terms  excellent,  good,  bad,  etc  need  to  be  defined  so  that 
some  sort  of  repeatable  datum  can  be  established  for  design  and  verification 
purposes.  What  sort  of  error  metric  will  best  match  the  reproduced  quality 
definitions? 


The  FB  transform  may  appear  to  be  a  better  transform;  however, 
Chen  and  Huo  report  that  70%  percent  of  transform's  end  product  was  shown  to 
be  redundant.  What  sort  of  results  should  be  expected  from  the  best 
transform?  Can  KL  give  an  indication  as  which  transformation,  if  any,  is 
best?  1 

Fourier  Transform 

I 

The  DFT  is  a  commonly  used  technique  for  representing  discrete¬ 
time  signals  because  it  is  completely  invertible  and  makes  otherwise  difficult 


mathematical  operations  (convolution  and  correlation)  simpler.  The  ability 
to  convolve  and  correlate  signals  using  the  FT  makes  it  a  powerful  signal 
processing  technique.  The  DFT  also  satisfies  intuition  as  it  provides  a 
measure  of  a  signal's  spectral  content  by  correlating  the  signal  with  a  set  of 
complex  sinusoids [ 12 ] .  Consequently,  intuition  and  familiarity  make  the  DFT 
a  comfortable,  and  powerful,  spectral  analysis  tool.  Given  that  the  DFT  is 
completely  invertible  for  discrete-time  applications  and  preserves  the  phase 
Information  of  a  signal,  it  is  logical  to  use  it  to  represent  such  signals. 

The  DFT  can  be  viewed  as  representing  signals  on  a  linear  complex 
frequency  axis.  But  it  is  well  known  that  human  perception  is  non-linear. 
Parsons [5]  describes  the  MEL  scale  which  approximates  the  response  of  human 
hearing  as  being  linear  below  1  KHz  and  logarithmic  above  1  KHz.  Then,  it  is 
also  reasonable  to  suggest  that  speech  should  be  represented  by  some  non¬ 
linear  transform  which  matches  human  perception  rather  than  the  linear  FT. 

Selecting  the  particular  non-linearity  is  beyond  the  scope  of  this 
research.  However,  the  DFT  may  provide  some  insight  into  the  correct  non¬ 
linearity.  If  a  speaker's  voice  is  correctly  characterized  then  the  vector 
of  component  averages  defined  in  equation  three  will  describe  the  average 
frequency  response  of  the  speakers  voice  signals.  This  information  may  lead 
to  a  suitable  transform.  The  non-linear  transform  selected  needs  to  be 
completely  invertible  and  so  the  Discrete  Wavelet  Transform  (DWT)  may  be 
suitable . 
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Wavelet  Transform 


The  Wavelet  Transform  (WT)  should,  in  this  case,  be  considered  as 
an  alternative  to  the  FT.  The  WT  offers  a  means  for  implementing  the 
constant-Q  filtering  technique  used  by  Pols' [14]  and,  due  to  the  logarithmic 
frequency  scaling,  better  models  human  perception  (see  MEL  scale[5]). 

Rather  than  produce  a  set  of  cross-correlations  between  the  signal 
and  a  discrete  set  of  complex  sinusoids,  as  with  the  DFT,  the  DWT  performs 
cross-correlations  with  scaled  and  shifted  versions  of  the  'mother 
wavelet’ [18] .  The  mother  wavelet  can  be  considered  analogous  to  a  DFT 
'window'  function  except  that  the  operation  is  one  of  correlation  not 
convolution. 


Selection  of  the  mother  wavelet  would  be  the  essential  task  for 
producing  quality  reproductions  and,  if  properly  chosen,  the  optimization  of 
KL  transform  may  not  be  necessary  (assuming  such  a  wavelet  exists) .  The  KL 
transform  as  described  above  requires  the  'average  spectrum'  (equation  three) 
of  a  speaker's  voice  to  construct  a  covariance  matrix  and  the  eigenvectors  of 
the  covariance  describe  the  directions  of  maximum  variance .  This  information 
is  derived  from  the  set  of  vectors  that  characterize  the  speaker's  voice  and 
so  may  be  useful  for  designing  a  mother  wavelet. 

Regardless  of  the  representation  of  the  speech  samples,  if  a 
speaker  dependent  speech  model  is  used  then  a  means  for  characterizing  the 
speakers  voice  is  needed.  This  characterization  will  be  known  as  training 
set  design. 


24 


Training  Set  Design 


A  problem  with  the  training  sets  used  by  Pols [14]  and  Chen  and 
Huo[16]  is  that  examples  of  the  testing  set  were  used  to  characterize  the 
speakers'  voices.  Obviously  this  type  of  training  set  is  not  possible  for  a 
communications  system  as  it  suggests  the  speech  must  be  known  before  it  was 
spoken.  What  is  needed  is  a  means  for  characterizing  a  speaker's  voice 
independent  of  the  words  actually  spoken.  A  training  set  comprised  of  the 
range  of  sounds  made  would  be  independent  of  any  words  actually  compressed. 
Pols [14]  alluded  to  this  when  he  referred  to  using  "...  sounds  within  an 
utterance..."  as  the  reference  for  time  normalization  purposes. 

White [9]  also  describes  the  role  of  phonemes  in  speech 
recognition,  he  writes,  "Spoken  words  can  be  represented  as  strings  of 
phonemes  -  the  basic  building  blocks  of  speech.  In  spoken 
English[American]^  about  38  phonemes  (16  vowels  and  22  consonants)  are 
typically  used.  There  are  two  advantages  to  recognizing  phonemes  in  a  speech 
recognition  system.  Phonemes  make  possible  (1)  selective  recall  of  word 
prototypes  and  (2)  reduction  of  memory  requirements  to  store  word  prototypes  — 
i.e.  data  compression." 

Pols [14]  also  made  a  comment  regarding  the  approximation  of  the 
'sounds  which  make  up  an  utterance'  when  commenting  on  time  normalization. 

It  is  difficult  to  extract  Pols'  contextual  meaning  of  'approximate';  however, 
it  is  known  that  the  phonemes  of  a  language  are  rarely  clearly  spoken[19]. 
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This  Author's  coiranent. 


That  is,  speakers  only  approximate  a  phoneme  then  transit  to  making  the  next 
phoneme  of  the  word.  Consequently,  words  actually  consist  of  approximated 
phonemes  and  the  transitions.  Is  there  a  set  of  words  containing  all 
possible  phonemes  and  transitions  with  which  a  training  set  can  be  developed 
to  characterize  speakers'  voices? 

Pattern  recognition  studies  includes  the  study  of  training  sets. 

A  number  of  rules-of-thumb  have  been  developed  to  choose  the  size  of  the 
training  set  with  respect  to  the  dimensionality  of  the  feature  space  (Foley's 
Rule (20))  and  with  respect  to  the  complexity  of  implementation  (Widrow's 
Rules [20]).  While  these  rules  of  thumb  are  based  upon  empirical  studies,  the 
underlying  theme  is  the  training  set  must  represent  the  process  being 
modelled.  So  when  developing  a  training  set  to  characterize  the  voice,  all 
possible  combinations  of  the  sounds  must  be  represented  and  these  combinations 
must  occur  often  enough  to  allow  an  accurate  characterization.  Assuming  this 
training  set  exists,  then  how  big  should  it  be? 

If  the  training  set  is  based  upon  phonemes,  then  there  are  s40 
phonemes  and  »1600  phoneme  bigrams.  So,  the  training  set  should  include  a 
number  of  examples  of  each  of  the  1600  phoneme  bigrams. 

An  efficient  training  set  design  will  result  in  a  set  of  easily 
spoken  sentences  (capture  speakers'  natural  way  of  speaking)  with  a  uniformly 
distributed  range  of  phonemes.-  There  may  be  some  minimization  possible  by 
adopting  the  principles  of  a  grammar.  That  is,  connected-word  recognition 
can  be  improved  by  using  the  rules  of  grammar [4 ) .  Grammar  compilers  that 

build  training  sets  based  on  'word'  bigrams  have  been  shown  by  Brown  and 
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Wilpon[21]  to  improve  connected  word  recognition.  It  may  be  that 
speakers  never  speak  all  1600  phoneme  bigrams  and  that  some  simplification  of 
the  phoneme  based  training  set  is  possible  by  taking  this  into  account. 

Sumaary 


It  is  not  clear  whether  a  compression  model  that  preserves  the 
phase  spectrum  of  a  speakers  voice  will  support  reproductions  of  better 
quality  that  those  of  LPC.  I 

The  DFT  is  an  invertible  measure  of  a  discrete-time  signal's 
complex  frequency  spectrum.  The  KL  transform  is  an  optimal  compression 
technique  (in  a  least  squares  error  sense)  that  has  been  successfully  used  for 

i 

speech  compression.  Consequently,  the  KL  transform  should  support  a 

I 

compression  model  that  preserves  the  phase  information  of  the  original  speech. 

I 

The  quality  of  speech  tends  to  be  a  judgement  rather  than  a 
measure  and  so  some  means  of  quantifying  quality  is  needed  if  objective 
comparisons  of  compressions  are  to  be  made. 

Due  to  the  logarithmic  frequency  response  of  human  hearing,  the 
linear  frequency  scale  of  the  FT  may  not  be  the  best  way  of  representing 
speech  for  compression. 

The  following  chapter  details  the  experiments  that  tested  some  of 
the  assumptions  made  and  associated  issues  raised  in  the  proposal. 


CHAPTER  THREE 


EXPERIMENTATION 
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Overview 


The  experiments  use  the  KL  transform  to  compress  speech  frequency 
vectors  into  KL  coefficients.  The  two  primary  goals  are  to  test  whether 

preserving  the  phase  components  of  the  complex  frequency  vectors  influences 
reproduced  quality  and  to  test  the  utility  of  four  error  metrics  as  predictors 
of  reproduced  quality. 

The  assumptions,  that  a  voice  can  be  characterized  using  a  set  of 
sounds  that  the  speaker  makes  (phonemes)  and  that  the  DFT  is  a  suitable 
representation  of  speech,  will  also  be  tested. 

All  quality  assessments  will  be  made  according  to  those  levels  of 
quality  defined  in  the  proposal. 

Before  describing  the  experiments,  a  brief  functional  description 
of  the  software  developed  is  warranted.  The  software  developed  for  these 
experiments  perform  the  following  three  functions:  generate  files  containing 
the  covariance  matrix  and  a  vector  of  component  averages;  decompose  the 
covariance  matrix  into  a  KL  transform  matrix;  and,  transform  discrete-time 
speech  files  to  KL  coefficients  then  reconstruct  the  speech  by  inverse 
transforming  the  coefficients. 

The  programs  of  Appendices  B  and  C  contains  all  of  these 
functional  modules;  however.  Appendix  C  has  the  modules  spread  over  four 
listings  and  allows  for  the  general  case  of  a  characterization  set  contained 
in  multiple  files.  Also  included  in  Appendix  C  are  three  other  listings 
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which  contain  library  functions  ard  listings  that  convert  sound  files  (NeXT 
format)  to  speech  data  files  (integers),  and  vice  versa.  Specific  tests 
have  additional  modules  added  and  these  will  be  referred  to  during  the 
relevant  discussions. 

Equipment  Used 

The  NeXT  workstation  was  used  as  the  speech  processing  platform. 
Speech  was  sampled  at  11.02  KHz  and  16-bit  linear  quantization  using  the 
NeXT's  Sound  Recorder  software.  Compression  simulations  were  written  in  C 
and  are  attached  in  the  Appendices. 

Influence  of  Phase 

This  experiment  will  test  the  influence  of  phase  on  the  quality  of 
reproduced  speech  for  two  cases.  The  first  case  (listing  of  Appendix  A)  does 
not  use  compression  and  simply  replaces  the  real  and  imaginary  components  of  a 
speech  vector  with  the  magnitude  (real)  and  zero  (imaginary).  The  second 
case  (listing  of  Appendix  B)  tests  the  influence  of  phase  on  reconstructed 
speech  that  has  been  compressed  and  then  expanded. 

Without  Compression 

The  objective  of  the  experiment  is  to  establish  whether  preserving 
the  phase  of  speech  signals  influences  the  quality  of  reconstructed  speech. 

The  quality  levels  defined  in  the  proposal  (page  9)  are  used  as  references. 

There  are  two  steps  to  the  testing  process.  Step  one  is  to 
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simply  listen  to  the  unprocessed  speech  and  evaluate  it  for  quality.  Step 
two  calculates  the  magnitude  of  the  complex  frequency  components  and  loads 
these  values  into  the  real  locations  of  the  Fourier  transform  input  buffer. 

The  imaginary  locations  are  set  to  zero  and  the  inverse  Fourier  transform 
performed.  This  discrete-time  signal  was  replayed  and  evaluated  for  quality. 
The  program  of  Appendix  A  was  written  for  step  two. 

The  above  test  provides  an  expectation  benchmark  for  subsequent 
tests  as  well  providing  an  indication  of  the  influence  of  phase  on  quality. 
However,  a  more  informative  experiment  is  to  test  the  influence  of  phase  on 
the  compression  model.  Consequently,  the  above  comparison  needs  to  be 
repeated  with  the  compression  and  expansion  simulation. 

With  Compression 

Appendix  B  contains  a  listing  that  compresses  the  magnitude-only 
frequency  coefficients.  Compression  is  a  more  laborious  task  as  a  vector  of 
component  averages  and  a  covariance  matrix  must  first  be  determined.  The 
covariance  matrix  is  decomposed  into  singular  values  and  eigenvectors  and  the 
KL  transformation  matrix  constructed.  Then  the  original  speech  can  be 
compressed  and  expanded  using  the  KL  transformation.  The  phase  preservation 
software  of  Appendix  C  is  more  complex  only  in  that  two  sets  of  ave.age, 
covariance,  KL  transform  and  KL  coefficients  are  used  to  reproduce  the  real 
and  imaginary  components  of  the  speech  vectors. 

Functional  descriptions  of  the  four  '' istings  at  Appendix  C  are  as 
follows.  Listing  one  transforms  the  discrete-time  samples  of  data  files  to 
256-point  frequency  vectors  and  the  vector  of  component  averages  saved  to  a 
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file  along  with  the  number  of  vectors  used  to  determine  these  averages. 
Subsequent  data  files  are  then  used  to  update  this  vector  of  averages. 

Listing  two  determines  a  covariance  matrix  by  re-reading  the 
discrete-time  data  files,  transforming  to  frequency  vectors  and  subtracting 
the  average  spectrum  from  each  frequency  vector.  The  outer  product  of  the 
difference  between  the  input  and  average  frequency  vector  is  taken  and  added 
the  covariance  matrix  (initialized  to  zero). 

After  all  data  files  are  read  and  the  covariance  matrix  formed, 
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the  covariance  matrix.  A,  is  decomposed  using  the  software  of  listing  three 
into  three  matrices  V.W.V^.  The  columns  of  V  are  the  eigenvectors  of  the 
covariance  matrix.  The  non-zero  elements  of  the  diagonal  matrix  W  contain 
the  singular  values  (squareroots  of  the  eigenvalues)  which  correspond  to  the 
eigenvectors  of  V.  is  the  transpose  V  and  is  saved  as  the  KL  transform 

I 

matrix.  i 

j 

Listing  four  requests  the  number  of  coefficients  used  in 
compression  model  and  this  number  is  used  to  select  the  number  of  rows  of  the 
KL  transform  matrix.  Frequency  vectors  are  multiplied  by  the  KL  transform 
matrix  yield  the  vector  of  coefficients.  The  coefficients  are  then 
multiplied  by  the  transpose  of  the  KL  transform  to  form  the  reconstructed 
frequency  vector  (note  that  the  transpose  of  the  KL  transform  matrix  equals 
its  inverse  as  the  matrix  is  constructed  from  orthonormal  eigenvectors). 

Just  as  in  the  phase  preservation  experiment  without  compression, 
this  experiment  has  two  steps.  The  first  step  is  to  apply  the  KL  transform 
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on  the  "magnitude  only"  frequency  vectors  and  the  second  step  preserves  the 
phase  of  the  speech  vector.  The  listings  of  Appendix  C  preserve  phase  by 
treating  each  complex  vector  as  a  pair  of  vectors  -  one  vector  represents  the 
real  part  of  the  frequency  components  and  the  other  the  imaginary  parts . 

Then  two  vectors  of  averages,  two  covariance  matrices,  two  KL  transform 
matrices  and  two  vectors  of  KL  coefficients  are  required.  The  discrete-time 
reconstructions  are  replayed  and  their  quality  judged. 

The  advantage  of  representing  a  vector  of  complex  numbers  as  dual 
vectors  of  real  numbers  is  that  half  the  computations  are  required  to 
decompose  two  real  covariance  matrices  as  compared  to  one  complex 
matrix[22] . 

Assuming  that  the  KL  transform  provides  optimal  reconstructions  of 
the  two  vectors,  the  relationship  between  the  vectors  (i.e.  the  phase)  will  be 
preserved  to  some  optimal  accuracy  for  the  number  of  KL  coefficients  used. 

Error  Metrics 

Four  error  metrics.  Mean  Square  Error  (MSE) ,  Relative  MSE  (RMSE) , 
RMSE  per  coefficient  and  Condition  Number  (CN)  will  be  tested  for  their 


33 


I 

I 


utility  in  predicting  reconstruction  quality.  MSE  is  defined  in  equation 
six. 

MSE  = 

where  E^  =  (REC^  -  ORG^) 

is  the  ith  component  of  the  error  vector,  and 
N  is  the  number  of  vectors. 

(6) 


The  RUSE  is  defined  below  in  equation  seven. 


RMSE 


1  REC,  ~  ORGt 
where  Ei  -  -j. 


N 


ORG, 


is  the  ith  component  of  the  error  vector,  and 
N  is  the  number  of  vectors. 

(7) 

The  RMSE  per  coefficient  is  the  RMSE  divided  by  the  number  of  coefficients 
used  for  the  reconstruction.  The  CN  is  defined  below  in  equation  eight. 


sv. 


where  is  the  singular  value  with  greatest  magnitude 
SV^  is  the  xth  singular  value 


(8) 
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Mean  Square  Error 

The  MSE  measures  the  reconstruction  error  with  respect  to  the 
original  frequency  vector.  Consequently,  listing  four  of  Appendix  C  should 
be  replaced  by  the  listing  of  Appendix  D  to  accumulate  the  relative  error  over 
the  range  of  speech  uaca  Ciansfuriiad.  At  the  end  of  the  program  the 
components  of  the  error  vector  are  averaged  over  the  total  number  of  frequency 
vectors  transformed.  The  2-norm  magnitude  of  the  error  vector  is  then 
calculated  as  described  by  equation  seven.  Once  the  MSE  is  calculated,  it  is 
written  to  an  error  file.  Note  that,  because  complex  frequency  vectors  are 
represented  by  two  vectors  of  real  numbers  (one  for  the  real  and  one  for  the 
imaginary  parts  of  t  frequency  vector)  there  are  two  error  spectrums  and  two 
RMSE  measurements.  Both  are  written  to  the  error  file  and  are  later  added 
to  form  a  metric. 

Relative  Mean  Square  Error 

The  RMSE  measures  the  reconstruction  error  with  respect  to  the 
original  frequency  vector.  Consequently,  listing  four  of  Appendix  C  should 
be  replaced  by  the  listing  of  Appendix  E  to  accumulate  the  relative  error  over 
the  range  of  speech  data  transformed.  At  the  end  of  the  program  the 
components  of  the  error  vector  are  averaged  over  the  total  number  of  frequency 
vectors  transformed.  The  2-norm  magnitude  of  the  error  vector  is  then 
calculated  as  described  by  equation  seven.  Once  the  RMSE  is  calculated,  both 
measurements  are  written  to  an  error  file  and  are  summed  for  a  metric. 

RMSE  Per  Coefficient 

RMSE  per  Coefficient  is  the  RMSE  calculation  divided  by  the  number 
of  coefficients  used  to  generate  the  reconstructed  vectors.  By  replacing 
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listing  four  of  Appendix  C  with  that  of  Appendix  F  the  RMSE  per  coefficient 
will  be  written  to  the  error  file. 

Condition  Vumber 

The  positive  squareroots  of  the  eigenvalues  of  the  covariance 
matrix  are  the  singular  values  of  the  matrix[23].  The  square  of  the 
eigenvalues  are  measurements  of  the  energy  represented  by  the  respective  KL 
coefficient [15]  (inner  product  of  the  associated  eigenvector  and  the  frequency 
vector).  Therefore,  the  ratio  of  the  largest  singular  value  to  the  nth 
singular  value,  or  condition  number  for  n  singular  values,  is  proportional  to 
the  energy  represented  by  n  KL  coefficients.  By  determining  condition 
number  from  the  decomposition  of  the  covariance  matrix  and  associating  it  with 
quality  levels  (average  of  CN  for  both  covariance  matrices),  a  means  for 
associating  compression  ratio  and  reproduced  quality  exists. 

Appendix  G  lists  the  code  for  the  CN  measurements.  The  code  that 
decomposes  the  covariance  matrix  and  saves  the  KL  transforms  (listing  three  of 
appendix  C)  has  been  supplemented  with  code  that  calculates  the  CN  and  saves 
it  to  a  file  (l.e.  Appendix  G) . 

Voice  Characterization 

The  assumption  was  that  a  voice  could  be  characterized  b^  a  set  of 
phoneme  blgrams.  In  order  to  generate  the  1600  (or  so)  phoneme  bigiiams,  a 
comprehensive  study  of  phonemics  is  required  which  is  beyond  the  scopL  of  this 
tnesls.  However,  the  underlying  principle  can  be  tested  by  forming  a 
compression  model  (KL  transform  matrix)  from  a  training  set  of  words 


containing  particular  sounds  and  then  compressing  other  words  that  can  be 
constructed  from  these  same  particular  sounds.  For  example,  the  words 
'black'  and  'clock'  can  represent  the  sounds  /bl/,  /lack/,  /cl/  and  /lock/  and 
so  be  used  to  characterize  the  word  '’'l;.>ck' . 

The  training  set  is  sampled  and  formed  into  discrete-time  data 
files.  The  software  of  Appendix  C  is  used  to  form  the  voice  model  and 
simulate  the  compression.  The  reconstructed  data  files  are  played  back  for  a 
quality  assessment  and  the  number  of  coefficients  used  is  noted.  Appendix  G 
contains  a  full  list  of  training  and  testing  data. 

Suitability  of  The  DFT  and  KL  Transforms 

The  assumption  that  KL  transforming  the  DFT  is  a  suitable  means 
for  compressing  speech  should  be  examined.  The  reason  for  examining  the  DFT 
is  based  upon  the  disparity  between  the  linear  frequency  scale  of  the  DFT  and 
the  non-linear  frequency  nature  of  human  hearing. 

The  eigenvectors  of  the  covariance  matrix  provides  the  directions 
of  greatest  variance  ranked  according  to  the  magnitude  of  the  associated 
eigenvalues.  The  relationships  described  by  the  eigenvectors'  directions  may 
require  analysis  if  any  meaningful  conclusions  regarding  the  DFT  can  be  made 
(beyond  the  scope  of  the  thesis).  However,  prior  to  forming  the  covariance 
matrix  an  average  spectrum  is  determined  from  the  frequency  vectors  of  the 
training  set.  If  the  speaker's  voice  is  properly  characterized,  then 
analysis  of  the  average  spectrxim  may  provide  a  starting  point  for  determining 
an  alternative  to  the  DFT  and/or  KL  transforms. 


CHAPTER  FOUR 


DISCUSSION  OF  RESULTS 
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Phase  and  Quality 


The  Influence  of  phase  on  reproduced  quality  was  tested  by 
comparing  the  results  of  two  tests.  The  first  test  simply  removed  the  phase 
components  from  the  complex  fourier  coefficients  and  reproduced  the  discrete- 
time  speech  using  the  'magnitudes  only'.  The  reproduced  speech  was  assessed 
for  its  quality. 

The  second  experiment  tests  the  Influence  of  phase  within  the 
compression  model.  Vectors  of  fourier  coefficients  are  transformed  to  sets 
of  KL  coefficients  and  then  inverse-transformed  back  to  frequency  vectors. 

The  experiment  is  performed  for  the  'magnitude  only'  and  complex  cases,  after 
which  the  reproduced  discrete-time  speech  is  assessed. 


Phase  Removal 


Removing  the  phase  components  from  the  complex  frequency  vectors 
produced  intelligible  speech  That  is,  the  speech  could  be  understood  but  it 
was  distorted  so  badly  that  the  speaker  could  not  be  recognized.  Figures  one 
and  two  show  the  original  and  reproduced  speech  used  for  the  test. 


Figure  2  Reproduced  Speech,  Magnitudes  Only 


Figures  one  and  two  are  known  to  have  the  same  power  spectrums . 
Also,  they  are  similar  in  that  at  coincident  times  there  is  a  similar  amount 
of  high  and  low  frequency  activity  in  the  reproduced  and  original  waveforms. 
However,  these  two  signal  sound  vastly  different. 

By  judging  the  original  and  reproduced  speech,  it  is  clear  that 
phase  does  influence  quality.  Therefore,  if  a  compression  technique  is  to 
maintain  the  quality  aspects  of  speech,  it  should  preserve  the  phase 
information. 
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Compression 


The  compression  experiments  are  similar  to  those  of  the  previous 
experiments.  Test  one  compresses  the  'magnitude  only'  spectrum  while  the 
second  series  of  compression  tests  preserves  the  phase  information  of  the 
original.  Figure  three  shows  the  discrete-time  plot  of  the  original  speech 
used  for  the  compression  tests. 


Original  Speech 
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Figure  3  Speech  used  for  Compression  Tests 


Magnitudes  Only 

The  'magnitude  only'  reproductions  are  shown  at  figures  four  and 
five  below.  The  term  "No  Compression"  means  that  the  KL  Transformation  did 
not  introduce  any  compression.  However,  there  is  an  inherent  two  to  one 
compression  due  to  the  even  nature  of  the  'magnitude  only'  spectrum.  The  two 
to  one  compression  of  the  KL  transform  produced  the  discrete-time  waveform  of 
figure  five  that  looks,  and  sounds,  very  much  like  the  uncompressed  'magnitude 
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only'  speech  of  figure  four.  This  result  suggests  that  the  KL  transformation 
discards  the  least  important  aspects  of  Che  speech  waveform. 


As  the  'magnitude  only'  representation  can  only  at  best  support 
intelligible  reproductions  and  the  objective  is  to  test  phase  preservation,  no 

I 

further  compression  tests  of  the  'magnitude  only'  spectrum  are  performed. 


Figure  six  above  shows  the  discrete-time  specf'um  for  an  eight  to 
one  compression.  Notice  tha^  the  waveform  closely  resembles  the  original 

speech  of  figure  three  as  there  are  similar,  coincident  in  time,  high  and  low 

frequencies.  Further,  the  reproduced  waveshape  is  as  balanced  about  the 
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amplitude  equals  zero  axis  as  the  original  speech  waveform.  The  eight  to  one 
compression  waveform  was  chosen! as  it  was  judged  that  this  compression  ratio 
was  the  highest  for  excellent  quality  reproductions  (could  not  distinguish 


between  original  and  reproduction) .  Good  quality  speech  (speaker  clearly 
recognized  but  speech  noisey  or  distorted)  was  reproduced  for  compression 
ratios  up  to  20  to  one  and  intelligible  speech  was  achieved  for  compressions 


of  approximately  26  to  one. 


The  results  of  this  experiment  show  that  the  KL  transform  will 
compress  speech  effectively  and  that  phase  can  be  preserved  by  representing 
the  complex  frequency  vector  as  two  vectors  of  real  numbers  (one  for  the 
imaginary  components  and  one  for  the  real  components). 

Can  the  reproduced  quality  be  predicted  for  a  particular 
compression  ratio?  Alternatively,  can  reproduced  quality  be  specified  and  a 
corresponding  compression  ratio  selected  to  meet  that  specification? 

Error  Metrics 

An  error  metric  needs  to  be  a  single  number  that  can  be  used  to 
achieve  repeatable  results.  The  four  metrics  chosen  will  each  produce  two 
numbers  as  the  complex  frequency  vectors  are  represented  as  a  vector  pair. 
These  two  measurements  can  be  added  together  to  give  the  single  metric. 

Mean  Square  Error 

The  MSE  was  measured  by  averaging  the  error  for  each  frequency 
component  over  all  frequency  vectors,  then  taking  the  magnitude  of  the  mean 
error  vector.  The  plot  of  figure  seven  shows  the  MSE  for  an  increasing 
number  of  coefficients.  The  reduction  in  MSE  for  an  increasing  niomber  of  KL 
coefficients  is  consistent  with  the  literature  and  intuitive  as  more 
coefficients  increase  the  degrees  of  freedom  and  so  'better'  approximations 
can  be  supported. 
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Figure  7 


MSE  versus  Number  of  KL  Coefficients 


The  problem  with  the  MSE  is  that  the  magnitude  is  only  meaningful 
if  the  size  of  the  vectors  transformed  are  known. 

Relative  Mean  Square  Error 

Figure  eight  shows  the  Relative  HSE  for  increasing  numbers  of  KL 
coefficients.  Notice  that  the  RMSE  increases  between  16  and  32  coefficients 
which  is  at  odds  with  the  previous  MSE  result.  Also,  it  is  known  from  the 
previous  experiment  that  the  reproduced  quality  is  excellent  for  that  range  of 
KL  coefficients.  The  reason  is  easily  explained  by  realizing  that  the  RMSE 
applies  an  equal  weighting  to  each  frequency  component  of  the  vector 
transformed  by  normalizing  each  component  (see  equation  seven) .  Then  very 
low  amplitude  frequencies,  which  may  contribute  little  to  the  quality  of  the 
reproductions,  can  be  reproduced  with  high  relative  errors  but  still  very  low 


Relative  Mean  Square  Error  per  Coefficient 

After  examining  the  RMSE  curve  of  figure  eight,  it  was  decided  to 


evenly  distribute  the  RMSE  over  the  number  of  coefficients  used  for  the 
transformation.  This  was  an  attempt  at  producing  a  metric  that  conformed 
with  intuition  by  decreasing  with  increasing  coefficients.  The  results 
obtained  were  similar  to  those  of  the  RMSE  tests  except  scaled  by  the  number 
of  coefficients. 


Condition  Nuinber 


Figure  9  Condition  Number  versus  Number  of  KL  Coefficients 

Figure  nine  shows  condition  number  plotted  against  the  number  of 
KL  coefficients.  The  curve  has  a  slope  with  a  uniform  sign  (positive  slope), 
so  satisfying  intuition,  and,  once  the  quality  levels  are  associated  with  a 
particular  CN,  compression  and  quality  can  be  associated.  This  compression- 
quality  association  can  be  made  directly  from  the  decomposed  covariance 
matrix.  As  an  example,  consider  the  threshold  for  excellent  quality 
established  by  the  phase  preservation  tests  at  16  KL  coefficients.  Reading 


from  the  curves  of  figure  nine,  16  coefficients  corresponds  to  a  CN  of  145  for 
speaker  one  and  180  for  speaker  two.  If  this  result  held  for  all  speech  (if 
the  voice  was  properly  characterized  it  would) ,  then  the  number  of 
coefficients  ass  ciated  with  a  CN  of,  say,  200  are  needed  for  excellent 
quality. 

Also  notice  that  the  CN  for  the  two  speakers  are  close  right  up  to 
the  region  where  excellent  quality  begins.  There  is  no  perceivable 
improvement  above  16  coefficients,  CN  of  approximately  200,  and  so  the  energy 
associated  with  the  higher  order  coefficients  is  low,  consequently  the  CNs 
vary  by  greater  amounts  due  to  the  widely  varying  low  energy  frequency 
components . 

Voice  Characterization 

It  was  assumed  that  a  speaker  can  be  characterized  by  a 
comprehensive  set  of  phonemes.  To  test  the  concept  the  system  will  be 
trained  on  one  set  of  words  and  then  used  reproduce  a  second  set,  made  up  of 
similar  utterances  but  different  words.  If  the  second  set  are  reproduced  as 
accurately  as  the  training  set,  the  assumption  will  be  considered  reasonable. 
However,  a  much  more  exhaustive  study  using  a  comprehensive  training  set  will 
still  be  required.  As  a  matter  of  interest  the  technique  will  also  be  used 
on  multiple  speakers . 

Single  Speaker 

An  individual  KL  transform  was  computed  for  each  of  the  two 
speakers  and  used  to  transform  the  testing  sentence  of  Appendix  H.  The 
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results  of  each  test  were  the  same.  That  is,  the  threshold  for  excellent 
quality  occurred  at  16  coefficients  (note  the  similarity  with  the  phase 
preservation  experiments). 

One  point  of  interest  was  the  reproduction  of  the  fricative  /f/  in 
the  word  fly.  As  the  number  of  coefficients  were  decreased  below  16,  the 
quality  of  the  reproduction  became  noisier  as  in  the  phase  preservation 
experiment.  However,  for  this  experiment,  the  /f/  sound  became  noisier  for 
more  coefficients  than  the  previous  tests.  Apart  from  that,  the  testing 
sentence  was  transformed  as  well  as  any  of  the  sentences  from  the  training 
set. 


Multiple  Speakers 

Figure  nine  shows  the  energy  curve  for  two  speakers  which  was 
derived  from  each  one  speaking  the  eight  training  sentences  of  Appendix  H. 
When  four  or  more  coefficients  are  used,  the  energy  levels  are  within  1%  of 
each  other.  When  more  than  eight  are  used  the  two  energy  curves  are  less 
than  0.1%  from  each  other. 

When  the  dual  speaker  KL  Transform  was  used  on  each  speaker's 
testing  sentence  (Appendix  H) ,  the  resulting  reproduced  quality  sounded  no 
different  from  that  achieved  when  each  of  the  single  speaker  KL  Transforms 
were  used.  That  is,  the  threshold  of  excellent  quality  was  again  16 
coefficients  and  the  fricative  /f/  was  distorted  more  than  the  voiced  sounds 


when  less  than  16  coefficients  were  used. 


Sunmary 


Removing  the  phase  information  from  frequency  vectors  and  only 
using  the  magnitudes  will  produce  Intelligible  speech  but  not  toll  quality 
speech. 

Excellent  quality  can  be  achieved  with  compression  ratios  of  up  to 
eight  to  one  by  preserving  phase  and  using  the  KL  transform. 

The  MSE  decreases  with  decreases  in  compression  ratio;  however, 
the  RMSE  and  RMSE  per  Coefficient  do  not  vary  consistently  with  compression 
ratio. 


The  condition  number  is  related  to  the  signal  energy  ranked  by  the 
KL  Transform  and  Increases  with  decreasing  compression  ratio. 

Words  can  be  transformed  without  the  compression  model  being 
specifically  trained  on  those  words.  That  is,  the  training  set  need  only 
contain  the  sub-words  (utterances,  syllables,  etc)  to  be  transformed. 

The  KL  Transform  compression  model  could  also  transform  multiple 


voices . 


/ 

r" 


CHAPTER  FIVE 


CONCLUSIONS  AND  RECOMMENDATIONS 
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Influence  of  Phase  on  Quality 

A  complex  Fourier  transform  produces,  for  a  set  of  frequencies, 
magnitudes  which  represent  the  amount  of  energy  at  a  particular  frequency  and 
phase  which  represents  the  time  relationship  between  the  frequencies  of  the 
set.  Consequently,  when  the  magnitudes  are  maintained  and  the  phase  set  to 
zero,  the  time  relationships  between  the  frequency  components  of  the  utterance 
is  lost.  When  this  time  relationship  is  discarded,  toll  quality  speech  (i.e. 
no  processing)  is  reduced  to  intelligible  speech.  Therefore,  phase  does 
influence  the  reproduced  quality  of  speech. 

There  are  at  least  two  implications  of  this  conclusion.  Firstly, 
if  toll  quality  communications  is  to  be  achieved  with  compressed  speech,  some 
method  for  preserving  the  phase  information  is  needed.  The  second 
implication  concerns  speech  recognizors.  If  speakers  are  to  be  recognized 
then  preserving  the  phase  information  will  be  required.  For  speaker 
dependent  recognizors,  including  the  phase  information  in  the  training  pattern 
(or  equivalent  technique)  should  enhance  performance  as  there  are  more  degrees 
of  freedom  with  which  to  separate  individual  patterns. 

Further  to  the  phase  preservation  results  is  the  technique  used  to 
preserve  phase.  Representing  a  vector  of  complex  numbers  as  a  vector  pair  of 
real  numbers  proved  effective.  As  each  vector  of  the  pair  is  reproduced  with 
minimum  MSE,  then  reconstructing  with  this  pair  provides  a  close 
representation  of  the  original  speech.  The  utility  of  the  technique  is  it  is 
not  as  computationally  expensive  as  working  with  a  KL  Transform  consisting  of 
complex  eigenvectors  for  rows  and  a  frequency  vector  consisting  of  complex 


52 


components.  However,  phase  preservation  may  not  be  suitable  for 
communications  applications  as  it  might  be  difficult  to  transmit  the  33 
coefficients  over  narrowband  channels. 

Error  Metrics 

Of  the  four  error  metrics  investigated  only  the  MSE  and  ER  were 
sensible  proposals.  The  MSE  metric  is  not  suitable  as  it  requires  some  prior 
knowledge  of  the  amplitude  of  the  original  speech  for  the  MSE  measurement  to 
be  useful.  This  prior  knowledge  could  be  obtained  after  tile  compression 

system  is  trained.  Testing  data  (different  to  that  of  the  training  set) 

I 

could  be  passed  through  the  system;  the  average  MSE  measured  for  the  range  of 

! 

compression  ratios;  and,  then  a  compression  ratio  that  matches  the  required 

I 

! 

quality  performance  selected,  | 

i 

j 

The  CN  is  probably  the  most  useful  metric.  Assuming  the 

j 

speaker's  voice  is  properly  characterized,  then  the  CN  can  be  derived  directly 

I 

I 

form  the  training  set  without  any  testing  data  being  required.  Although, 
some  prior  knowledge  of  the  association  between  quality  levels  and  CN  is 
necessary.  A  suitable  compression  ratio  can  then  be  selected  according  to 
the  performance  criteria.  The  results  of  the  CN  experiments  agree  with  those 
reported  by  Chen  and  Huo[16]  (l.e.  98%  energy  needed  for  excellent  quality)  as 
a  C??  of  200  corresponds  to  approximately  99%  of  the  speech  energy. 


Voice  Characterization 


Voices  can  be  characterized  by  training  sets  that  consist  of  sub¬ 
word  utterances.  This  means  that  a  speaker  dependent  compression  model  can 
be  developed  on  a  training  set  of  sub-words  and,  so  long  as  those  sub-words 
are  representative  of  the  speech  to  be  compressed,  subsequent  speech  can  be 
compressed  and  reproduced  with  toll  quality. 

The  implication  of  the  characterization  result  is  that  speech  can 
be  preprocessed  using  the  KL  Transformation  into  a  space  of  smaller  dimension 
and  not  lose  any  of  its  quality.  Consequently,  applications  such  as  speech 
recognition  and  speaker  identification  can  be  performed  in  the  reduced  space. 

The  examples  produced  by  this  thesis  show  that  speech  vectors  of 
256  discrete-time  samples  can  be  represented  with  33  coefficients.  The 
number  of  coefficients  can  be  predicted  by  the  CN  metric  and  experience. 

That  is,  toll  quality  requires  *  CN  of  200  which  is  represented  by  16 
coefficients  in  each  vector  of  the  pair  plus  the  DC  value. 

Optimum  Speech  Transform 

The  KL  Transform  performed  as  predicted  by  the  literature  in  that 
Chen  and  Huo's(16]  results  were  reproduced  and  MSE  decreased  with  increasing 
KL  coefficients  as  predicted  Tou  and  Gonzalez [ 15 ] .  So  it  is  clear  that  the 
KL  Transrorm  is  a  suitable  technique  for  speech  compression. 

A  result  of  interest  is  that  the  number  of  KL  c'  .'ficients  was  of 
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only  l/8th  the  ntouiber  of  the  Fourier  coefficients  representing  the  original 
speech  vector.  Therefore,  the  Fourier  Transform  is  not  the  optimal  means  for 
representing  the  original  speech. 

The  optimal  means  for  representing  the  original  speech,  might  be 
found  from  the  KL  Transformation.  Assume  that  a  voice  has  been  completely 
characterized  by  some  training  set  of  frequency  vectors  (complex  or  pairs  of 
reals) .  Singular  Value  Decomposition  (or  some  other  eigen-decomposition) 
produces  the  directions  of  maximum  variance  (eigenvectors)  through  the  space 
(or  spaces)  and  the  energy  associated  with  each  direction  (singular  values)  is 
also  known.  In  order  to  generate  the  covariance  matri.t  the  average  spectrum 
is  determined.  The  question  becomes  -  Is  there  enough  information  in  the 
directions  of  maximum  variance  and  the  average  spectrum  with  which  to 
determine  the  optimum  transform? 

The  answer  is  beyond  the  scope  of  this  thesis  but  it  might  be  that 
the  optimal  transform  can  be  found  analytically.  A  completely  Invertible 
compact  transform  where  the  energy  was  spread  evenly  among  the  coefficients 
would  be  the  desired  result. 

Reconanendations 

The  recommendations  concern  applications  involving  the  KL 
Transform  and  areas  for  further  study. 

Applications 

In  order  to  achieve  quality  reproductions,  the  phase  information 
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associated  with  the  complex  frequency  vectors  needs  to  preserved. 


For  applications  such  as  speaker  dependent  speech  recognition  and 
speaker  identification,  the  KL  transform  can  be  used  to  achieve  feature 
reduction  (i.e.  KL  Transform  the  Fourier  Transform)  and  without  loss  of 
quality. 


Communications  applications  should  only  use  the  KL  Transform 
techniques  when  the  application  allows  speaker  independence  to  be  traded  off 
against  reproduced  quality. 

The  storage  of  large  quantities  of  speech  data  could  also  use  the 
technique  as  parts  of  the  speech  could  be  used  to  characterize  the  voice  and 
the  whole  speech  sequence  transformed  for  storage. 

Further  Study 

There  are  two  areas  for  further  study.  Firstly,  the  phonemic 
characterization  of  voices  should  be  investigated  further  as  the  potential 
exists  for  training  sets  to  be  developed  independent  of  the  application. 

Secondly,  finding  the  optimal  transform  for  speech  would  allow 
more  efficient  compression  models  to  be  developed. 
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PHASE  REMOVAL  SIMULATION:  WITHOUT  COMPRESSION 


/* 


Command  Line  Inputs  :  remove  speech.dat 


The  program  reads  the  data  file  of  discrete-time  samples  and  fourier 
transforms  the  data  to  complex  frequency  vectors.  The  magnitudes  of  the  complex 
components  are  determined  and  loaded  back  into  the  real  locations  of  the  FFT 
buffer  while  the  imaginary  locations  are  set  to  zero. 

The  Inverse  fourier  transform  is  performed  and  the  reconstructed  data 
written  out  to  a  file. 


Outputs  :  nophase . dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  September  1992. 

*/ 


#include  <fcntl.h> 
#include  <stdio.h> 
#include  <raath.h> 
#include  <string.h> 
#include  <stdlib.h> 
#include  "recipes. h" 


#deflne  MAX_AMP  32768 
#define  N  256 
#define  n  128 


void  main(argc , argv) 

Int  argc; 
char  *argv[]; 

{ 

/*  DECLARE  VARIABLES  */ 

FILE  *inhandle , *outhandle ; 

int  i,  j,  k,  vectors,  integers_read,  tempint; 
long  *buff; 

double  *d,  *data_mag,  DC; 
char  outfile[]  -  "nophase.dat"; 


A  -  2 


/ 


/ 


/*  CHECK  ARGUMENTS  */ 

if(  argc  !-  2) 

{ 

printf ("Format:  D>KLT  source. xxx") ; 
exit(-l) ; 

) 


/*  CREATE  ARRAYS,  INITIALISED  TO  ZERO  */ 


buff  -  lvector(l,N) ; 
d  -■  dvector(l,2*N) : 
data_mag  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 
/*  Single  frequency  vector  of  magnitudes  */ 


/*  NEED  TO  KNOW  HOW  MANY  VECTORS  ARE  IN  THIS  FILE  */ 

/*  open  input  file  */ 
inhandle  -  fopen(argv[l] , "r") ; 
if( inhandle  NULL) 

{ 

printf ("Can' t  open  file  %s."  ,argv[l]); 
exit(-l) : 

) 

vectors  -  integers_read  -  0;  /*  input  counter  */ 

/*  count  vectors  */ 

while(fscanf (inhandle , "%d" ,&tempint)  !-  EOF) 

{ 

integers_read++ ; 

/*  increment  input  counter  if  whole  vector  was  read  */ 
if  (integers_read  —  N) 

{ 

vectors++; 
integers_read  -  0; 

) 

) 

rewind( inhandle ) ;  /*  reset  input  file  */ 
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/*  PERFORM  EXPERIMENT  */ 

/*  open  output  file  */ 
outhandle  -  fopen(outfile, "w") ; 
if(outhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s."  .outfile); 
exit( -1) : 

) 

integers_read  -  0;  /*  set  integer  counter  */ 

/*  re  -  read  whole  file  */ 

k  -  1; 

while (k  <-  vectors) 

{ 

fscanf( inhandle, "%d" ,&tempint) ; 

integers_read++ ; 

buff ( lntegers_read]  -  tempint; 

if  (integers_read  “  N) 

{ 

/*  convert  input  characters  to  floats  */ 
for  <1-1;  i<-integers_read;  i++) 

{ 

/*  scale  data  for  an  array  of  floats  between  0  and  1*/ 
d(2*i-l]  -  (double)  buff[i]: 
d(2*i-l]  /-  MAX  AMP; 

) 

/*  compute  complex  fft  */ 

fourl(d,N, 1) ;  /*  overwrite  input  array  */ 

/*  convert  rectangular  to  polar  and  keep  magnitudes  */ 
mag(d,n,data_mag,DC) ; 

/*  clear  fft  buffer  */ 
for  (i-1;  1<-2*N;  1++) 
d[i]-0.0; 

/*  load  real  locations  of  fft  array  */ 
for  (i-0;  i<-n;  1++) 
if  (i— 0) 

d[i+l]  -  DC; 

else  d[2*i+l]  -  data_mag[i]; 
for  (1-1;  i<n;  i++) 

d[n+2*i+l]  -  data_mag{n-i] ; 

/*  compute  inverse  fft  */ 
fourl(d,N, -1) ; 
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/*  write  reconstructed  data  to  to  output  file  */ 
for  (i-l;  i<-N:  i++) 

{ 

d[2*i-l]  *-  MAX_AMP; 
d[2*i-l]  /-  N; 

fprintf (outhandle, "%d\n" , (int)  d[2*i-l] ) ; 

) 

/*  clear  fft  buffer  */ 
for  (i-l;  i<-2*N;  i-H-) 
d[i]-0.0; 

integers_read  -  0; 
k++; 

) 

) 

fclose( inhandle) ; 
fclose(outhandle) : 


PHASE  REMOVAL  SIMULATION:  WITH  COMPRESSION 


/* 


Command  Line  INPUTS  :  compmag  source  file 


Program  reads  frames  of  speech  from  the  source  file  and  fourier  transforms 
the  N  time  samples  into  N/2  frequency  magnitudes.  These  magnitudes  form  vectors 
from  which  a  covariance  matrix  is  computed. 

The  covariance  matrix  is  decomposed,  using  Singular  Value  Decomposition 
(SVD),  into  a  vector  of  SVs  and  two  orthogonal  matrices  of  eigenvectors.  The 
eigenvectors  are  ranked  according  to  the  magnitude  of  the  associated  SV 
(squareroot  of  eigenvalue)  and  used  as  the  rows  of  the  KL  transform  matrix. 

The  inverse  KL  transform  matrix  is  the  transpose  of  the  KL  transform. 


OUTPUTS  :  File  KLT.dat 

:  File  averages . dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  during  August  1992. 


#include  <fcntl.h> 
#include  <stdio.h> 
#include  <math.h> 
#include  <string.h> 
#include  <stdlib.h> 
#include  "recipes. h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


void  main(argc , argv) 
int  argc; 
char  *argv[ ] ; 

{ 

/*  DECLARE  VARIABLES  */ 

FILE  *inhandle , *outhandle ; 

int  i,  j,  k,  vectors,  integers_read,  tempint,  dim; 
long  *buff: 

double  *d,  *data_mag,  *average,  **A,  *W,  **V,  DC,  *KL; 
char  outfile[]  -  "witcomp.dat"; 
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/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

printf ("Enter  ntimber  of  coefficients  <1  -  %d>  ...  ",n); 
scanf ("%d" ,&dim) ; 
printf("\n\n") ; 


/*  CHECK  ARGUMENTS  */ 

if(  argc  !-  2) 

{ 

printf ("Format:  D>KLT  source. xxx") ; 
exit(-l) ; 

) 


/*  CREATE  ARRAYS.  INITIALISED  TO  ZERO  */ 


buff  -  lvector(l ,N) ; 
d  -  dvector(l,2*N) ; 
data_mag  -  dvector(l ,n) ; 
average  -  dvector(l,n) ; 
A  -  dmatrix(l,n, l,n) ; 

V  -  dmatrix(l ,n, 1 ,n) : 

W  -  dvector(l,n) ; 

KL  -  dvector(l,dim) ; 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 
/*  Single  frequency  vector  of  magnitudes  */ 
/*  average  of  discrete  frequencies  */ 

/*  Covariance  Matrix  gets  over  written  */ 

/*  Matrix  for  right  singular  vectors  */ 

/*  Matrix  of  singular  values  */ 

/*  matrix  of  KL  coefficients  */ 


/*  NEED  TO  KNOW  HOW  MANY  VECTORS  ARE  IN  THIS  FILE  */ 

/*  open  input  file  */ 
inhandle  -  fopen(argv[l] , "r") ; 
if(lnhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s."  ,argv[l]); 
exit(-l) ; 

) 

vectors  -  0;  /*  input  counter  */ 

integers_read  -  0;  /*  Integer  counter  */ 

/*  count  vectors  and  find  average  of  each  vector  component  */ 
while (f scanf (inhandle , "%d" ,&tempint)  !-  EOF) 

( 

integers_read++ ; 

buff [ integers_read]  -  tempint; 
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/*  increment  input  counter  if  whole  vector  was  read  */ 
if  (integers_read  —  N) 

{ 

vectors++;  /*  update  new  vector  */ 

/*  convert  input  characters  to  floats  */ 
for  (i-1;  i<-integers_read;  i++) 

{ 

/*  scale  data  for  an  array  of  floats  between  0  and  1*/ 
d[2*l-l]  -  (double)  buff[i]: 
d[2*i-l]  /-  MAX_AMP; 

) 

/*  compute  complex  fft  */ 

fourl(d,N, 1) ;  /*  overwrite  input  array  */ 

/*  convert  rectangular  to  polar  and  keep  magnitudes  */ 
mag(d,n, data_mag,DC) ; 

/*  accumulate  like  components  */ 
for  (i-1;  i<-n;  i++) 

average [ i ]  +-  data_mag [ i ] ; 

/*  clear  input  array  */ 
for  (i-1;  i<-2*N;  i++) 
d[i]-0.0; 

integers_read  -  0;  /*  reset  for  next  vector  */ 

) 

)  /*  file  flushed  */ 


/*  COMPUTE  MEANS  */ 

for  (i-1;  K-n;  i++) 

average [ i ] -average [ 1 ] /vectors ; 


/*  DETERMINE  COVARIANCE  MATRIX  */ 

/*  ensure  input  array  cleared  */ 
for  (i-1;  i<-2*N;  1++) 
ci[i]-0; 

/*  reset  input  file  */ 

rewind( inhandle) ;  /*  close  input  file  */ 


integers_read  -  0; 


/*  re  -  read  whole  file  */ 
k  -  1; 

while (k  <-  vectors) 

{ 

fscanf (inhandle , "%d" ,&tempint) ; 

integers_read-H- ; 

buff [ integers_read]  -  tempint; 

if  (integers_read  ~  N) 

{ 

/*  convert  input  characters  to  floats  */ 
for  (i-1;  l<-integers_read;  i-H-) 

{ 

/*  scale  data  for  an  array  of  floats  between  0  and  1*/ 
d[2*l-l]  -  (double)  buff[i]: 
d[2*i-l]  /-  MAX_AMP; 

) 

/*  compute  complex  fft  */ 

fourl(d,N, 1) ;  /*  overwrite  input  array  */ 

/*  convert  rectangular  to  polar  and  keep  magnitudes  */ 
mag(d,n,data_mag,DC) ; 

/*  subtract  mean  from  each  component  */ 
for  (i-1;  K-n;  1++) 

data_mag(l]  -  data_mag[i]  -  average [i]; 

/*  update  covariance  matrix  */ 
for  (i-1;  i<-n;  i-H-) 

for  (j-1;  j<-n;  j-H-) 

A(i][J]  data_raag[i]  *  data_mag[j]; 

/*  clear  input  array  */ 
for  (i-1;  1<-2*N:  i-H-) 
d(i]  -  0; 

integers_read  -  0; 


/*  DECOMPOSE  COVARIANCE  MATRIX  */ 
svdcmp(A,n,n,W,V) ; 

eigsrt(W,A,n) ;  /*  columns  of  A  are  ranked  rows  of  KL  transform  matrix  */ 
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/*  COMPRESS  SPEECH  USING  KL  TRANSFORM  */ 

/*  open  output  file  */ 
outhandle  -  fopen(outfile , "w") ; 
if (inhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s."  .outfile); 
exit(-l): 

) 

/*  ensure  input  array  cleared  */ 
for  (i-l;  i<-2*N:  i-H-) 
d(i]-0: 

/*  reset  input  file  */ 

rewind(inhandle) ;  /*  close  input  file  */ 

integers_read  -  0; 

/*  re  -  read  whole  file  */ 
k  -  1; 

while (k  <-  vectors) 

{ 

fscanf (inhandle, "%d" ,&tempint) ; 

lntegers_read++ ; 

buff [integers_read]  -  tempint; 

if  (integers  read  —  N) 

( 

/*  convert  input  characters  to  floats  */ 
for  (i-l;  i<-integers_read;  i-H-) 

{ 

/*  scale  data  for  an  array  of  floats  between  0  and  1*/ 
d[2*l-l]  -  (double)  buff[i]; 
d[2*i-l]  /-  MAX_AMP; 


/*  compute  complex  fft  */ 

f6url(d,N, 1) ;  /*  overwrite  input  array  */ 

i 

I 

convert  rectangular  to  polar  and  keep  magnitudes  */ 
ma'g  ( d ,  n ,  data_mag ,  DC  )  ; 

I 

/*lclear  fft  buffer  */ 
fof  (i-l;  i<-2*N;  i-H) 
d[i]-0.0: 

/*  subtract  mean  from  each  component  */ 
for  (i-l;  i<-n;  in) 

data_mag[i]  —  average [i]; 
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/*  compress  data  */ 
for  (i-l;  i++) 

{ 

KL[1]  -  0.0; 

for  (j-1;  j<-n;  j++) 

KL[1)  +-  A(J]tlI  *  data_mag[j]: 

) 

/*  expand  data  by  inverse  transformation  */ 
for  (i-l;  i<-n;  i++) 

{ 

data_mag[i]  -  0.0; 
for  (j-l;  j<-dim;  J++) 

data_mag[i)  +-  A[i][j]  *  KL[j]: 

) 

/*  add  mean  of  each  component  */ 
for  (i-l;  i<-n;  i++) 

data_mag[i]  +-  average [i]; 

/*  load  real  locations  of  fft  array  */ 
for  <i-0;  K-n;  i++) 
if  (i— 0) 

d[l+l]  -  DC; 

else  d(2*i+l]  -  data_mag [ i ] ; 
for  (i-l;  i<n;  1++) 

d[n+2*i+l]  -  data_mag(n-i) ; 

/*  compute  inverse  fft  */ 
fourl(d,N, -1) ; 

/*  write  reconstructed  data  to  to  output  file  */ 
for  (i-l;  K-N;  1++) 

{ 

d[2*i-l]  *-  MAX_AMP; 
d[2*l-l]  /-  N; 

fprintf (outhandle , "%d\n" , (int)  d[2*i-l] ) ; 

) 

/*  clear  input  array  */ 
for  (i-l;  i<-2*N;  i++) 
d[i]  -  0; 

integers_read  -  0; 
k-H- ; 


) 

fclose(inhandle) ; 
fclose( outhandle) ; 


LISTING  ONE 


/*  AVERAGES  MATRICES  CONSTRUCTION  PROGRAM 


Command  Line  INPUTS  :  average  speechfile.dat 


Program  tests  if  the  two  average  files  exist.  If  the  files  do  not  exist 
they  are  created  and  two  128  element  column  vectors  are  loaded  with  zeros. 

If  (when)  the  files  exist  the  speech  file,  specified  at  the  command  line, 
is  opened  and  discrete -time  samples  are  fourier  transformed  to  generate  vectors 
of  N  complex  fourier  coefficients. 

The  complex  vectors  are  separated  into  two  real  vectors,  one  for  the  real 
components  and  one  for  the  imaginary  components.  These  vectors  are  used  to 
update  the  two  average  vectors . 


Outputs  provided  ;  avgs__Re.dat 
:  avgs_Im.dat 


Written  by  FLTLT  Don  Dryley  at  AFIT  Sep  1992 

*/ 


#include  <fcntl.h> 
#lnclude  <stdio.h> 
#include  <math.h> 
#include  <string.h> 
#lnclude  <stdlib.h> 
#include  "recipes. h" 


#deflne  N  256 
#define  n  128 


void  main(argc,argv) 
int  argc; 
char  *argv[ ] ; 

{ 

FILE  *inhandle , *outlhandle , *out2handle ; 

int  i,  j,  k,  x,  MAX_AMP,  vectors,  New_vectors,  integers_read, 

temp int; 

long  *buff; 

float  tempfloat; 

double  *d,  *dara_R,  *data_I,  *averageR,  *averagel,  DC,  zero  -  0.0; 

char  averagesR[]  -  "avgs_Re.dat", 


averagesi [ ]  -  "avgs_Im.dat" ; 

/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

(  printf ("Format:  D>Cov_C  source .xxx") ; 

exit(-l) ; 

) 


/*  CREATE  ARRAYS,  INITIALISED  TO  ZERO  */ 


buff  -  lvector(l,N) ; 
d  -  dvector(l,2*N) ; 
data_R  -  dvector(l ,n) ; 
data_I  -  dvector(l ,n) ; 
averageR  -  dvector(l,n) ; 
averagel  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 
/*  Single  vector  of  real  coefficients  */ 

/*  Single  vector  of  imaginary  coefficients  */ 
/*  average  of  real  coefficients  */ 

/*  average  of  imaginary  coefficients  */ 


/*  OPEN  AVERAGES  FILES  */ 

/*  open  file  of  Real  averages  */ 
if  ((outlhandle  -  fopen(averagesR, "r"))  !-  NULL) 

{ 

/*  file  exists  */ 
fclose(outlhandle) ; 
outlhandle  -  fopen(averagesR, "r+") ; 
if  (outlhandle  ■—  NULL) 

{ 

printf  ("Can' t  open  file  %s.  Exiting  to  system\n’' .averagesR) ; 
exic(-l); 

) 

) 

else 

{ 

/*  create  file  for  writing  and  reading  */ 
outlhandle  -  fopen(averagesR, "w+") ; 
if  (outlhandle  --  NULL) 

{ 

printf ("Can' t  open  file  %s,  Exiting  to  system\n" .averagesR) ; 
exit(-l) : 

) 

/*  write  zero  averages  and  zero  vectors  to  file  */ 
for  (i-1;  K-n;  i++) 

fprintf (outlhandle , "%e\n" , (float)zero) ; 
f printf (outlhandle , "%d\n" , (int)  zero) ; 

/*  reset  pointer  to  start  of  file  */ 
rewind(outlhandle) ; 
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/*  LOAD  REAL  ARRAY  FROM  FILE  */ 


for  (i-1;  i<-n;  i++) 

( 

fscanf (outlhandle , "%e" ,&tempfloat) ; 
averageR[i]  -  (double)  tempfloat; 

) 

/*  read  number  of  contributing  vectors  */ 
fscanf (outlhandle , "%d" ,&vectors) ; 

/*  reset  pointer  to  start  of  file  */ 
rewlnd(outlhandle) ; 

/*  scale  averages  back  to  accumulated  sums  */ 
for  (i-l;  K-n;  i-H-) 

averageR(i]  *-  vectors; 

/*  open  file  of  Imaginary  averages  */ 
if  ((out2handle  -  fopen(averagesI , "r") )  !-  NULL) 

( 

/*  file  exists  */ 
fclose(out2handle) ; 
out2handle  -  fopen(averagesI , "r+") ; 
if  (out2handle  —  NULL) 

{ 

prlntf ( "Can' t  open  file  %s,  Exiting  to  system\n" , averagesi) ; 
exit(-l) ; 

) 

) 

else 

( 

/*  create  file  for  writing  and  reading  */ 
out2handle  -  fopen(averagesI , "wi") ; 
if  (out2handle  —  NULL) 

I 

prlntf ( "Can' t  open  file  %s,  Exiting  to  system\n" , averagesi ) ; 
exlt(-l) : 

) 

/*  write  zero  averages  and  zero  vectors  to  file  */ 
for  (i-l;  K-n;  1++) 

fprintf (out2handle , "%e\n" , (float)zero) ; 
fprintf (out2handle , "%d\n" , (Int)  zero) ; 

/*  reset  pointer  to  start  of  file  */ 
rewind (out2handle) ; 


/*  LOAD  IMAGINARY  ARRAY  FROM  FILE  */ 

for  (i-1;  K-n;  i++) 

{ 

fscanf  (out2handle ,  "%e"  ,&teinpfloat) ; 
averagel[i]  -  (double)  tempfloat; 

) 

/*  reset  pointer  to  start  of  file  */ 
rewind (out2handle) ; 

/*  scale  averages  back  to  accumulated  sums  */ 
for  (i-1;  i<-n;  i++) 

averagel[i]  *-  vectors; 


/*  OPEN  SPEECH  DATA  FILE  */ 

inhandle  -  fopen(argv[l] , "r") ; 
ifdnhandle  —  NULL) 

( 

printf ("Can' t  open  file  %s."  ,argv(l]); 
exit(-l) ; 

) 

/*  read  entire  file  and  find  maximum  amplitude  */ 

fscanf( inhandle, "%d",&MAX_AMP) ; 

while (fscanf (inhandle , "%d" ,&tempint)  !-  EOF) 

{ 

if  (abs(MAX_AMP)  <  abs(tempint)) 

MAX_AMP  -  tempint; 

) 

rewind (inhandle) ; 


/*  FOURIER  TRANSFORM  SPEECH  DATA  */ 

for  (x-1;  x<-2;  x-i-+) 

{ 

if  (x— 2)  /*  offset  pointer  for  50%  overlap  */ 
for  (i-1;  i<-n;  i++) 

fscanf (inhandle, "%d" ,&tempint) ; 

/*  count  vectors  and  find  average  of  each  vector  component  */ 

integers_read  -  New_vectors  -  0; 

while (fscanf (inhandle, "%d" ,&tempint)  !-  EOF) 

{ 

lntegers_read++; 

buff [integers_read]  -  tempint; 
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/*  increment  input  counter  if  whole  vector  was  read  */ 
if  (integers_read  —  N) 

{ 

New_vectors-H-;  /*  update  new  vector  */ 

vectors++; 

/*  convert  input  characters  to  doubles  */ 
for  (i-1;  i<-N;  i-H-) 

{ 

d[2*i-l]  -  (double)  buff[i]; 
d[2*i-ll  /-  abs(MAX_AMP) ; 

) 


/*  compute  complex  fft  */ 

fourl(d,N, 1) ;  /*  overwrite  input  array  */ 

/*  collect  real  and  imaginary  vectors ,  ignore  DC  */ 
for  (i-1;  K-n;  i++) 

{ 

data_R[i]  -  d[2*i+l] ; 
data_I[l]  -  d[2*i+2] ; 

) 

/*  accumulate  like  components  */ 
for  (i-1;  i<-n;  i-H-) 

{ 

averageR[i]  data__R[i]; 
averagelfi]  -f-  data  l[ij; 

) 

/*  clear  input  array  */ 
for  (i-1;  i<-2*N;  i-H-) 
d[il-0.0; 

integers_read  -  0;  /*  reset  for  next  vector  */ 


rewind( inhandle) ; 


/*  UPDATE  AVERAGES  AND  SAVE  TO  FILES  */ 

for  (i-1;  i<-n;  i-H-) 

{ 

averageR[i] /-vectors ; 
averagel [ i ] /-vectors ; 

) 

for  (i-1;  i<-n;  i-H-) 

fprintf (outlhandle , "%e\n” ,averageR[i] ) ; 
fprintf (outlhandle , "%d\n" .vectors) ; 
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fclose(outlhandle) ; 
for  (i-1;  i<-n;  i-H-) 

fprincf (out2handle , "%e\n" .averagel [i] ) ; 
fprintf (out2handle , "%d\n" .vectors) ; 
fclose(out2hancle) ; 

) 


LISTING  TWO 


/*  COVARIANCE  AND  AVERAGES  MATRICES  CONSTRUCTION  PROGRAM 

Command  Line  INPUTS  :  cov  speechfile.dat 

Program  tests  if  the  covariance  files  exist.  If  the  files  do  not  exist 
they  are  created  and  loaded  with  zeros  in  a  512  x  512  matrix.  If  the  files 
exist  the  speech  file  is  opened  and  used  to  upda'ie  the  covariance  matrix. 

Associated  with  the  covariance  matrices  are  files  of  averages  which  holds 
the  average  value  of  the  complex  frequency  components  used  to  build  the 
covariance  matrices. 


Outputs  provided  :  covar_Re . dat 
;  covar  Im.dat 


Written  by  FLTLT  Don  Dryley  at  AFIT  Sep  1992 

*/ 


#include  <fcntl.h> 
#include  <stdio.h> 
#include  <math.h> 
#include  <string.h> 
#include  <stdllb.h> 
#include  "recipes. h" 


#define  N  256 
#define  n  128 


void  main(argc ,argv) 
int  argc; 
char  *argv[ ] ; 

{ 


FILE 

int 

long 
float  . 
double 

char 


*inhandle , *outlhandle , *out2handle ; 

i,  j,  k,  X,  MAX_AMP,  vectors,  New_vectors,  integers_read, 
temp int; 

*buff ; 
temp float ; 

*d,  *data_R,  *data_I ,  *averageR,  **A_R,  *averagel,  **A_I,  *W, 

**V,  **Cov,  DC,  zero  -  0.0; 
averagesR[]  -  "avgs_Re.dat", 
averagesl[]  -  "avgs_Im.dat", 
covarlanceR[ ]  -  "covar_Re.dat", 
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covariancel [ ]  -  "covar_Im.dat"; 


/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

{  printf ("Format;  D>Cov_C  source. xxx") ; 

exit(-l) ; 

) 


/*  CREATE  ARRAYS.  INITIALISED  TO  ZERO  */ 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 

/*  Single  vector  of  real  coefficients  */ 

/*  Single  vector  of  imaginary  cocoefficients  */ 
/*  average  of  real  components  */ 

/*  average  of  imaginary  components  */ 

/*  covariance  matrix  of  real  vectors  */ 

/*  covariance  matrix  of  imaginary  vectors  */ 

/*  Matrix  of  eigenvectors  */ 

/*  Matrix  of  eigenvalues 


/*  OPEN  AVERAGES  FILES  */  ; 

I 

/*  open  file  of  Real  averages  */  j 

outlhandle  -  fopen(averagesR, "r") ;  ! 

if  (outlhandle  NULL)  I 

{  I 

printfC'Can't  open  file  %s,  Exiting  tol system\n" , averagesR) ; 
exit(-l);  j 

) 

/*  load  real  array  */ 
for  (i-1;  i<-n;  i++) 

{ 

fscanf (outlhandle , "%e" ,&tempfloat) ; 
averageR[i]  -  (double)  tempfloat; 

) 

fclose(outlhandle) ; 

/*  open  file  of  Imaginary  averages  */ 
outlhandle  -  fopen(averagesI , "r") ; 
if  (outlhandle  --  NULL) 

{ 

printfC’Can't  open  file  %s,  Exiting  to  system\n" .averagesl) ; 
exit(-l) : 

) 


buff  -  lvector(l,N) ; 
d  -  dvector(l,2*N) ; 
data_R  -  dvector(l,n) ; 
data_l  -  dvector(l,n) ; 
averageR  -  dvector(l,n) ; 
averagel  -  dvector(l,n) ; 
A_R  -  dmatrix(l,n,l,n) ; 
A_I  -  dmatrix(l ,n, 1 ,n) : 

V  -  dmatrix(l ,N, 1 ,N) ; 

W  -  dvector(l ,N) ; 
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/*  load  imaginary  array  */ 
for  (i-l;  i<-n;  i-H-) 

{ 

fscanf (outlhandle , "%e" ,&tempfloat) ; 
averagel[i]  -  (double)  tempfloat; 

) 

fclose( outlhandle) ; 


/*  OPEN  SPEECH  DATA  FILE  */ 

inhandle  -  fopen(argv[ 1] , "r") ; 
if (inhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s."  ,argv[l)); 
exlt( -1) ; 

) 

/*  read  entire  file  and  find  maximum  amplitude  */ 
New_vectors  -  integers_read  -  0; 
fscanf (Inhandle, "%d",&MAX_AMP) ; 
while (fscanf( inhandle, "%d" ,&tempint)  !-  EOF) 

( 

if  (abs(MAX_AMP)  <  abs(tempint)) 

HAX_AMP  -  terapint; 
integers_read++ ; 
if  (lntegers_read  ~  N) 

( 

New^vectors-H- ; 
lntegers_read  -  0; 

) 

) 

rewind( inhandle) ; 


/*  LOAD  COVARIANCE  MATRICES  */ 

/*  open  Real  covariance  file  */ 

if  ((outlhandle  -  fopen(covarianceR, "r"))  I-  NULL) 

{ 

/*  file  exists  */ 
fclose(outlhandle) ; 
outlhandle  -  fopen(covarianceR, "r+") ; 
if  (outlhandle  ~  NULL) 

( 

printf  ("Can' t  open  file  %s.  Exiting  to  system\n" 
exlt(-l) : 

) 

) 

else 

{ 


coA'arianceR) ; 
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/*  create  file  for  writing  and  reading  */ 
outlhandle  -  fopen(covarianceR,  "w+'')  ; 
if  (outlhandle  ■—  NULL) 

{ 

printf  (“Can' t  open  file  %s,  Exiting  to  system\n"  , covarianceR) ; 
exit(-l); 

) 

/*  write  zero  covariances  to  file  */ 
for  (i-1;  i<-n;  i-H-) 

for  (j-l;  j<-n:  j-H-) 

fprintf  (outlhandle,  "%e\n'' ,  (float)  zero); 

/*  reset  pointer  to  start  of  file  */ 
rewlnd(outlhandle) ; 

) 

/*  load  covariance  array  */ 
for  (i-1;  i<-n;  i++) 

for  (j-l;  j<-n;  j-H-) 

( 

fscanf (outlhandle , "%e" ,&tempfloat) ; 

A  R(i](j]  “  (double)  tempfloat; 

) 

rewind(outlhandle) ; 

/*  open  Imaginary  covariance  file  */ 
if  ((out2handle  -  fopen(covarianceI , "r") )  !-  NULL) 

{ 

/*  file  exists  */ 
fclose(out2handle) ; 

out2handle  -  fopen(covarianceI ,  "r-^") ; 
if  (out2handle  —  NULL) 

{ 

printf  ("Can' t  open  file  %s.  Exiting  to  system\n"  .covariancel) ; 
exit(-l); 

) 

) 

else 

{ 

/*  create  file  for  writing  and  reading  */ 
out2handle  -  fopen(covarianceI ,  "w-»-") ; 
if  (out2handle  —  NULL) 

{ 

printf  ("Can' t  open  file  %s,  Exiting  to  sj  stera\n"  ,  covariancel) ; 
exit( -1) ; 

) 

/*  write  zero  covariances  to  file  */ 
for  (i-1;  i<-n;  i-H-) 

for  (j-l;  j<-n;  j-H-) 

fprintf (out2handle, "%e\n" , (float)  zero) ; 
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/*  reset  pointer  to  start  of  file  */ 
rewind(out2handle) ; 

) 

/*  load  covariance  array  */ 
for  (i-1;  K-n;  i++) 

for  (j-1;  j<-n;  j-H-) 

{ 

fscanf (out2handle , "%e" ,&tempfloat) ; 
A_I[l][j]  -  (double)  tempfloat; 

) 

rewind(out2handle) ; 


/*  UPDATE  COVARIANCE  MATRICES  */ 

/*  two  passes  of  file,  0%  and  50%  overlap  */ 
for  (x-l;  x<-2;  x-h-) 

{ 

if  (x— 2)  /*  offset  pointer  for  50%  overlap  */ 
for  (i-l;  K-n;  i++) 

fscanf(inhandle,"%d",&tempint) ; 

/*  update  covariance  matrix  with  frequency  vectors  */ 
integers_read  -  k  -  0;  /*  integer  counter  */ 

while(k<New  vectors) 

( 

fscanf (inhandle, "%d" ,&tempint) ; 

integers_read-H- ; 

buff ( integers_read)  -  tempint; 

/*  increment  input  counter  if  whole  vector  was  read  */ 
if  (integers_read  —  N) 

{ 

/*  convert  input  characters  to  doubles  */ 
for  (i-l;  K-N;  i++) 

( 

d(2*i-l]  -  (double)  tuff[i]; 
d[2*i-l]  /-  MAX_AMP; 

) 

/*  compute  complex  fft  */ 

fourl(d,N,l) ;  /*  overwrite  input  array  */ 

/*  collect  real  and  imaginary  vectors,  ignore  DC  */ 
for  (1-1;  K-n;  1++) 

{ 

data_R[i]  -  d[2*i+l]; 
data_I[i]  -  d(2*i+2] ; 

) 
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/*  subtract  average  of  each  component  */ 
for  (i-l;  K-n;  i++) 

{ 

data_R[i]  —  averageR[ i] ; 
data_I [ i ]  —  average I [ 1 ] ; 

) 

/*  update  covariance  matrix  */ 
for  (i-l;  i<-n;  i++) 

for  (j-l;  J<-n:  j-H-) 

{ 

A_R[i][j]  +-  data_R[i]*data_R[j] : 
A_Iti][ji  +-  data_l[i]*data_I[j ] ; 

) 

/*  clear  input  array  */ 
for  (i-l;  i<-2*N;  i++) 
d(i]-0.0: 

integers_read  -  0;  /*  reset  for  next  vector  */ 
k-H-; 

) 

) 

rewind( inhandle) ; 

) 

/*  write  covariances  to  file  */ 
for  (i-l;  K-n;  i++) 

for  (j-l;  j<-n;  j-H-) 

fprintf (outlhandle , "%e\n" , (float)  A_R[ i ] [ j ] ) ; 
fclose(outlhandle) ; 
for  (i-l;  i<-n;  i-H-) 

for  (j-l;  j<-n;  j-H-) 

fprintf (out2handle , "%e\n" , (float)  A_I [ i] [ j ] ) ; 
fclose(out2handle) ; 


C  -  13 


LISTING  THREE 


/*  KARHUNEN  -  LOEVE  TRANSFORMATION  PROGRAM 


Command  Line  Inputs  ;  kit 


This  program  decomposes  the  two  covariance  matrices  using  Singular  Value 
Decomposition.  The  SVD  routine  was  extracted  from  Numerical  recipes  in  C 
(Cambridge  Press) . 

The  eigenvectors  of  the  covariance  matrix  overwrite  the  columns  of  the 
covariance  matrix  and  written  to  files  as  the  Karhunen-Loeve  Transform  matrices. 


OUTPUTS  :  File  KLT_Re.dat 
:  File  KLT  Im.dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT,  Sep  1992 

*/ 


#include 
# include 
#lnclude 
#include 
#incluc'e 
#include 


<fcntl.h> 
<stdio.h> 
<math.h> 
<string.h> 
<stdlib .h> 
"recipes .h" 


#deflne  MAX_AMP  32768 
#define  N  256 
#define  n  128 


void  main(argc , argv) 
int  argc; 
char  *argv[]; 

{ 

FILE  *inhandle , *outlhandle , *out2handle ; 

int  1,  J,  k,  X,  vectors,  integers_read,  tempint; 

float  tempfloat; 

double  **A_R,  **A_I,  *W,  **V; 

char  ^  covarianceR[ ]  -  "covar_Re.dat", 

covariancel [ ]  -  "covar_Im.dat", 
transformR[]  -  "KLT_Re.dat", 
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transformi [ ]  -  "KLT_Im.dat"; 


/*  CREATE  ARRAYS,  INITIALISED  TO  ZERO  */ 


A_R  -  dniatrix(l  ,n,  1  .n) ; 
A_I  -  dniatrix(l,n,l,n) : 
V  -  dmatrix(l ,n, 1 ,n) ; 

W  -  dvector(l,n) ; 


/*  covariance  of  real  coefficients  */ 

/*  covariance  of  imaginary  coefficients  */ 
/*  Matrix  of  eigenvectors  */ 

/*  Matrix  of  eigenvalues  */ 


/*  LOAD  COVARIANCE  MATRICES  */ 

/*  open  Real  covariance  file  */ 
outlhandle  -  fopen(covarianceR, "r") ; 
if  (outlhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s,  Exiting  to  system\n" , covarianceR) ; 
exit(-l); 

) 

/*  load  covariance  array  */ 
for  (i-l;  i<-n;  1++) 

for  (j-l;  J<-n:  j-H-) 

( 

fscanf (outlhandle , "%e" ,&tempfloat) ; 

A_R(i][j]  -  (double)  tempfloat; 

) 

fclose (outlhandle) ; 

/*  open  Imaginary  covariance  file  */ 
outZhandle  -  fopen(covarianceI , "r") ; 
if  (outZhandle  —  NULL) 

{ 

printf ("Can' t  open  file  %s,  Exiting  to  system\n" .covariancel) ; 
exit(-l) ; 

) 


/*  load  covariance  array  */ 
for  (i-l;  i<-n;  i++) 

for  (j-l;  j<-n;  j-H-) 

{ 

fscanf (outZhandle , "%e" ,&tempfloat) ; 
A_I[i](j]  -  (double)  tempfloat; 

) 

fclose (outZhandle) ; 


/*  FORM  KL  TRANSFORMS  */ 

/*  find  and  sort  eigenvectors  of  A_R  and  A_I  */ 
svdcmp(A_R,n,n,W,V) ; 
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eigsrt(W,A_R,n) ; 

/*  open  KL_R  transform  file  */ 
outlhandle  -  fopen  (transformR, "w") ; 
if (outlhandle  —  NULL) 

{  printf ( "Can' t  open  file  %s" , transformR) ; 
exit(-l): 

) 

/*  write  real  part  of  eigenvectors  to  KL_R  transform  file  */ 
for  (i“l;  i<-n;  i-M-) 

for  (j-1;  j<-n:  j-M-) 

fprintf (outlhandle . "*e\n* . (float)  A_R[ j ] [ i ] ) ; 
fclose(outlhandle) ; 


svdcmp(A_I ,n.n,U.V) ; 
eigsrt(W. A_I ,n) ; 

/*  open  KL_I  transform  file  *. 
outlhandle  -  fopen  ( transferal . *w" ) ; 
if(outlhandle  —  NULL) 

{  printf ( "Can’ t  open  file  »s’ , transform!) , 

€xit(-l) ; 

) 

/*  write  imaginary  parts  of  eigenvectors  to  KL_I  transform  file  */ 
for  (i-l;  i<-n:  1++) 

for  (j-1;  j<-n:  j++) 

fprintf (outlhandle , "%e\n" , (float)  A_I[j][i]); 
fclose (outlhandle) ; 
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LISTING  FOUR 


/*  SPEECH  REDUCTION  PROGRAM 


Command  Line  Inputs  :  reduce  speech.dat 


Speech  source  files  are  first  converted  from  the  NeXT  sound  format  into  a 
data  format  using  the  program  sound_to.  Sound_to  creates  two  files,  a  header 
file  which  contains  the  source  file's  header  information  and  a  data  file  of 
integers  which  between  -32768  and  32768  containing  the  speech. 

This  program  uses  the  original  sound  file's  header  with  the  file  size 
adjusted  to  the  reconstructed  file  size  (usually  not  same  size  as  original) . 

The  program  prompts  the  user  for  the  number  of  coefficients  used  for  the 
reduction  experiment. 


Outputs :  temp_2 . dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  Sep  92 

*/ 


#include 
#include 
#inc lude 
#include 
#include 
#include 


<fcntl .h> 
<stdio .h> 
<raath .  h> 
<string.h> 
<stdlib .h> 
"recipes .h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


typedef  struct 
{ 

int  magic; 
int  DataLocation; 
int  DataSize; 
int  DataFormat; 
int  SamplingRate; 
int  ChannelCount; 
char  info [4]; 

)  SNDheader; 
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void  transform  (trans  R.trans  I. dim, ln_R, in_I , size ,KL_R,KL_I) 
double  *trans_R ,  *trans_I ,  *in_R ,  *in_I ,  '<*KL_R ,  **KL_I ; 
int  dim, size; 

/*  function  transforms  two  input  vectors  of  dimension  size  */ 

/*  into  two  vectors  of  coefficients  of  dimension  dim  using  */ 

/*  the  transformation  matrices  KL_R  and  KL_I  */ 

{ 

int  i,j: 

/*  transform  -  KL  multiplied  by  in_vector  */ 
for  (i-l;  i<-dim;  i-H-) 

{ 

trans_R[i]  -  trans_I[i]  -  0.0; 
for  (j-1;  j<-size;  j-H-) 

{ 

trans_R[i]  +-  KL_R[i][j]  *  in_R[j]; 
trans_I[i]  +-  KL_I(i]lj]  *  in_I[j]; 

) 

) 

) 

void  inverse  transform  (inv  R,inv  I. size, red  R,red  I,dim,KL_R,KL  I) 
doable  *inv_R,*inv_I,*red_R,*red_I,**KL_R,**KL_I; 
int  size, dim; 

/*  function  inverse  transforms  two  vectors  of  coefficients  of  */ 

/*  dimension  dim  into'  two  output  vectors  of  dimension  size  */ 

/*  using  the  transformatf on  matrices  KL_R  and  KL_I  */ 

{ 

Int  i,j; 

/*  inverse  transform  -  KLI  multiplied  by  coefficients  */ 
for  (i-l;  l<-size;  i-H-) 

{ 

inv_R[i]  -  inv_I[i]  -  0.0; 
for  (j-1;  j<-dim:  j-H-) 

{ 

inv_R[i]  -I--  KL_R[j)[i3  *  red_R[j); 
lnv_I[i]  -I-  KL_I[j][i3  *  red_I[j3; 

) 

} 


void  main(argc , argv) 
int  argc; 
char  *argv[ 3 ; 

I 


SNDheader 

FILE 

short 

int 

long 


SND; 

*inhandle,  *outhandle; 

♦tempdata ; 

i,  j,  k,  vectors,  integers_read, 
*buff,  templong; 


temp int,  dim; 
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float  tempfloat; 

double  **KL_R,  **KL_I,  *averageR,  *averagel,  *d,  *org_R,  *org_I, 

*rec_R,  *rec_I , *reduce_R ,  *reduce_I,  DC,  magnitude; 
char  KLT_R[]  -  "KLT_Re.dat", 

KLT_I[]  -  "KLT_Im.dat", 

AVG_R[]  -  "avgs_Re.dat", 

AVG_I  ( )  —  "avgs_Iin .  dat" , 
temp [ ]  -  " temp_2 . dat" ; 


/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

{  printf ("Format;  D>reduce  speech.dat  "); 

exit(-l); 

} 


/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

printf ("Enter  number  of  coefficients  <1  -  %d>  ...  ",n): 
scanf ("%d" ,&dim) ; 
printf("\n\n") ; 


/*  CREATE  MATRICES  */ 

buff  -  lvector(l ,N) ; 
d  -  dvector(l , 2*N) ; 
averageR  -  dvector(l ,n) ; 
averagel  -  dvector(l ,n) ; 
KL_R  -  dmatrix(l ,n, 1 ,n) ; 
KL_I  -  dmatrix(l,n,l,n) ; 
org  R  -  dvector(l ,n) ; 
org_I  -  dvector(l,n) ; 
rec_R  -  dvector(l,n) : 
rec_I  -  dvector(l,n) ; 
reduce_R  -  dvectcr(l,n) ; 
reduce_I  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 
/*  average  of  discrete  frequencies  */ 

/*  average  of  discrete  frequencies  */ 

/*  hold  KL  transform  */ 

/*  hold  KL  transform  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  reconstructed  frequency  vector  */ 
/*  holds  original  frequency  vector  */ 

/*  holds  set  of  coefficients  */ 

/*  holds  original  frequency  vector  */ 


/*  READ  TRANSFORM  FROM  FILE  */ 

/*  open  Real  transform  file  */ 
inhandle  -  fopen(KLT_R, "r") ; 
if (inhandle  —  NULL) 

{  printf("Can't  open  file  %s."  ,KLT_R): 
exit(-l) : 

) 
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/*  load  KL  from  transform  file  */ 
for  (i-1;  K-n;  i++) 

for  (j-l;  j<-ri;  j++) 

{ 

f scanf( Inhandle, "%e" ,&tempfloat) ; 
KL_R[l](j]  (double)  tempfloat; 

) 

fclose( Inhandle) ; 

/*  open  Imaginary  transform  file  */ 
inhandle  -  fopen(KLT_I , "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,KLT_I) ; 
exit(-l) : 

) 

/*  load  KL  from  transform  file  */ 

[  for  (i-l;  i<-n;  i++) 

for  (j-l;  j<-n;  j++) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
KL_I[l][j]  -  (double)  tempfloat; 

) 

fclose ( inhandle) : 


/*  READ  FREQUENCY  AVERAGES  FROM  FILE  */ 

/*  open  real  averages  file  */ 
inhandle  -  fopen(AVG_R, "r") ; 

I  if (inhandle  --  NULL) 

{  printf ("Can' t  open  file  %s,"  ,AVG_R); 

exit(-l); 

) 


/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n;  j++) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
aver3geR[j]  -  (double)  tempfloat; 

) 

fclose (inhandle) ; 

/*  open  imaginary  averages  file  */ 
inhandle  -  fopen(AVG_I , "r") ; 
if (inhandle  —  NULL) 

{  printf ( "Can' t  open  file  %s."  ,AVG_I); 

exit(-l) ; 

) 


C  -  20 


/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n:  j-H-) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
averagel[j]  -  (double)  tempfloat; 

) 

fclose(lnhandle) ; 


/*  KL  TRANSFORMATION  EXPERIMENT  */ 

/*  open  temporary  file  for  data  */ 
outhandle  -  fopen(temp, "w") ; 
if  (outhandle  ~  NULL) 

{  printf ("Can' t  open  file  %s.",temp); 
exit(-l) ; 

1 

/*  O’  '"!  source  file  for  reading  */ 
inh/"  iie  -  fopen(argv[l] ,  "r")  ; 
if  Uiihandle  —  NULL) 

{  printf ("Can' t  open  file  %s. " ,argv[l] ) ; 
exit(-l): 

) 

/*  count  number  of  vectors  in  source  file  */ 
integers_read  -  vectors  -  0; 
while  ( fscanf( inhandle, "%d",&tempint)  !-  EOF) 
{ 

integers_read-H- ; 
if  (integers  read  —  N) 

{ 

integers_read  -  0; 
vectors++; 

) 

) 

rewind( inhandle) ; 


/*  KL  TRANSFORM  SPEECH  */ 
integers_read  -  0; 

j  -  1: 

while  (J  <-  vectors) 

{ 

fscanf (inhandle , "%d" ,&tempint) ; 

integers_read++ ; 

buff [integers_read]  -  tempint; 
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/*  tratiiiform  a  vector  */ 
if  (integers  read  —  N) 

{ 

/*  load  input  array  to  fouriet  transform  */ 
for  (k-1;  k<-N;  k++) 

{ 

d(2*k-l]  -  (double)  buff [k] ; 
d[2*k-lj  /-  MAX_AMP; 

) 

/*  overwrite  input  with  complex  frequencies  */ 
fourl(d,N,l) : 

/*  extract  real,  imaginary  and  DC  components  */ 
rect(d,n,org_R,org_I,&DC) ; 

/*  subtract  mean  from  each  component  */ 
for  (k-1;  k<-n;  k-H-) 

( 

org_R[k]  —  averageR[k) ; 
org_l[kj  —  averagelik); 

) 

/*  generate  KL  coefficients  */ 

transform  (reduce  R , reduce_I , dim , org_R , org_I , n , KL_R , KL_I ) ; 

/*  reconstruct  frequency  vector  from  KL  coefficients  */ 
inverse  transform  (rec  R,rec  I, n, reduce  R, reduce  I,dim,KL_R, 
KL_1);~ 

/*  clear  input  array  for  inverse  FFT  */ 
for  (k-1;  k<-2*N;  k-H-) 
d[kl  -  0.0; 

/*  Insert  DC  value  */ 
d(l]  -  DC; 

/*  load  input  array  with  reconstructed  spectrum  */ 
for  (k-2;  k<-n-»-l;  k-H-) 

( 

d[2*k-l]  -  rec_R[k-l]  +  averageR[k-l] ; 
d[2*k)  -  rec_I[k-l]  -h  averagel [k-1] ; 

) 

for  (k-2;  k<-n;  k-H-) 

{ 

d[N-4-2*k-l]  -  rec_R[n-k-4-l]  +  averageR[n-k-^l] ; 
d[N-f2*k]  -  -1.0  *  (rec  I[n-k-^l]  +  averagel [n-k-^l] ) ; 

) 

/*  reconstruct  speech  from  reconstructed  vector  */ 
fourl(d,N,-l); 


/*  write  reconstucted  time  domain  data  to  output  file  */ 
for  (k-1;  k<-N;  k++) 

{ 

d[2*k-l]  *-  MAX_AMP; 
d[2*k-l]  /-  N; 

fprintf (outhandle, "%d\n" , (int)  d[2*k-l]); 

) 

/*  clear  input  array  for  next  vector  */ 
for  (k-1;  k<-2*N:  k-H-) 
d[k]  -  0.0; 

/*  increment  vector  counter  */ 
j++: 

/*  reset  elements  per  vector  counter  */ 
integers_read-0 ; 

1 

1 

fclose( inhandle) ; 
fclose (outhandle) ; 
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LISTING  FIVE 


/*  This  is  the  header  file  for  the  routines  declared  in  recipes. c  */ 

typedef  struct  FCOMPLEX 

I 

double  r,i; 

}  f complex; 

void  nrerrorO; 
int  *lvector(): 
long  *lvector(); 
float  *vector<) ; 
double  *dvector(): 
float  **matrix(); 
double  **dmatrix(); 
void  free_ivector() : 
void  free_lvector() ; 
void  free_vetcor() ; 
void  free_dvector() ; 
void  free_matrix() ; 
void  free_dmatrix() ; 
fcomplex  ComplexO; 
fcomplex  CaddO; 
fcomplex  Csub(); 
fcomplex  CdivO; 
fcomplex  CmulO; 
fcomplex  CdivO; 
fcomplex  ConjgO; 
fcomplex  CinvO  ; 
double  Cabs(); 
fcomplex  CsqrtO; 
fcomplex  RCmulO; 
void  fourlO; 
void  magO  ; 
void  rectO ; 
void  eigsrtO ; 
void  svdcmpO; 
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LISTING  SIX 


/*  This  file  contains  routines  from  Numerical  recipes  in  C  */ 

#include  <stdio.h> 

#include  <math.h> 

# include  <stdlib.h> 

#include  <fcntl.h> 

#include  <string.h> 


typedef  struct  FCOMPLEX 

{ 

double  r,i; 

)  fcomplex; 

#define  ROTATE(a.l.j .k.l)  g-a[l) [ j ] :h-a(k] ( 1] ;a[i] [j ]-g-s*(h+g*tau) :\ 
a[k] [l]-h+s*(g-h*tau) ; 


#define  SWAP(a,b)  tempr-(a) ;  (a)-(b);  (b)-tempr 


/*  PYTHAG  computes  sqrt(sqr(a)+sqr(b))  without  */ 

/*  destructive  overflow  or  underflow  */ 

static  float  at,  bt,  ct; 

#deflne  PYTHAG(a,b)  ((at-fabs(a))  >  (bt-fabs(b)) 

\(ct-bt/at , at*sqrt(l . 0+ct*ct)  )  :  (bt 

(ct-at/bt,bt*sqrt(1.0+ct*ct))  :  0.0)) 


static  float  maxargl , maxarg2 ; 

#define  MAX(a,b)  (maxargl-(a) ,maxarg2-(b) , (maxargl)  >  (maxarg2)  ?\  (maxargl 
(maxarg2) ) 


#deflne  SIGN(a,b)  ((b)  >-  0.0  ?  fabs(a)  ;  -fabs(a)) 


void  nrerror(error_text) 
char  error_text [ ] ; 

/*  Numerical  Recipes  standard  error  handler  */ 

{ 

fprintf(stderr, "Numerical  recipes  run-time  error .. .\n") ; 
fprintf (stderr , "%s\n" ,  error_text) ; 
fprintf (stderr , " . . .now  exiting  to  system. . .\n") ; 
exit(l) ; 

) 
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int  *ivector(nl,nh) 
int  nl,  nh; 

/*  Allocates  a  vector  of  integers  from  nl  to  nh  */ 

I 

int  *v , i : 

V  -  (int  *)  malloc( (unsigned)  (nh-nl+l)*sizeof (int) ) ; 

if  (!v)  nrerror( "allocation  failure  in  ivectorO"); 
else  for  (i-nl;  K-nh;  i++) 
v(i]  -  0; 

return  v-nl; 

) 


long  *lvector(nl ,nh) 
int  nl,nh; 

/*  allocates  a  long  int  vector  with  range  [nl..nh)  */ 

( 

long  *v; 
int  i; 

v^(long  *)malloc( (unsigned)  (nh-nl+l)*sizeof (long) ) ; 
if  (!v)  nrerror("allocation  failure  in  ivecto()"); 
else  for  (i-nl;  i<-nh;  i++) 
v[i]  -  0; 

return  v-nl; 

) 


float  *vector(nl,nh) 
int  nl,nh; 

/*  Allocates  a  float  vector  with  range  [nl..nh]  */ 

{ 

float  *v; 
int  i; 

v-(float  *)malloc( (unsigned)  (nh-nl+l)*sizeof (float) ) ; 
if  (!v)  nrerror( "allocation  failure  in  vector()"); 
else  for  (i-nl;  i<-nh;  i++) 
v[i]  -  0; 
return  v-nl; 

) 
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double  *dvector(nl,nh) 
int  nl,  nh; 

/*  Allocates  a  double  vector  with  range  [nl..nh]  */ 

{ 

double  *v; 
int  i; 

/*  Allocate  pointers  to  rows  */ 

v»(double  *)malloc( (unsigned)  (nh-nl+i)*sizeof (double) ) ; 
if  (tv)  nrerror( "Allocation  failure  dvector()"); 
else  for  (i-nl;  i<-nh;  i++) 
v[i]  -  0; 
return  v  -  nl; 

) 


float  **matrix(nrl,nrh,ncl,nch) 
int  nrl,  nrh,  ncl,  nch; 

/*  Allocates  a  float  matrix  with  range  [nrl . .nrh] [ncl . .nch]  */ 

{ 

int  i, j ; 
float  **m; 

/*  Allocate  pointers  to  rows  */ 

m-(float  **)  malloc( (unsigned)  (nrh-nrl+l)*sizeof (float*) ) ; 
if  (!m)  nrerror( "Allocation  failure  1  in  matrix()"); 
m  --  nrl; 

/*  Allocate  rows  and  set  pointers  to  them  */ 
for(l“nrl;  i<-nrh;  i++) 

{  m[i]-(float  *)  malloc( (unsigned)  (nch*ncl+l)*sizeof (float)) ; 
if  (!m[i])  nrerror( "Allocation  failure  2  in  matrix() ”) ; 
m[i]  --  ncl; 

) 

for  (i-nrl;  i<-nrh;  i++) 

for  (j-ncl;  j<-nch;  j++) 
m[i][j]  -  0.0; 

return  m; 

) 


double  **dmatrix(nrl ,nrh,ncl ,nch) 
int  nrl,  nrh,  ncl,  nch; 

/*  Allocates  a  double  matrix  with  range  [nrl. .nrh] [ncl. .nch]  */ 
{ 

int  i,j ; 
double  **m; 
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/*  Allocate  pointers  to  rows  */ 

m-(double  **)  malloc( (unsigned)  (nrh-nrl+l)*sizeof (double*) ) ; 
if  (!m)  nrerror( "Allocation  failure  1  in  dmatriy()"); 
m  --  nrl; 

/*  Allocate  rows  and  set  pointers  to  them  */ 
for(i-nrl;  i<-nrh;  i++) 

{  m[i]-(double  *)  malloc( (unsigned)  (nch-ncl+l)*sizeof (double) ) ; 
if  (!m(i))  nrerror( "Allocation  failure  2  in  dmatrix()"); 
m[i]  --  ncl; 

) 

for  (i-nrl;  i<-nrh;  i++) 

for  (j-ncl;  j<-nch;  j++) 
m[i](j]-0.0; 

return  m; 


void  free_ivector(v,nl ) 
int  *v,  nl; 

/*  Frees  an  integer  vector  allocated  by  ivcctorO  */ 

I 

free ((char*)  (v+nl)); 

) 


void  free  _lvector(v,nl) 
long  *v; 
int  nl ; 

/*  Frees  a  vector  of  long  allocated  by  lvector()  */ 

{ 

free ((char*)  (v+nl)); 

) 


void  free_vector(v,nl) 
float  *v; 
int  nl; 

/*  Frees  a  float  vector  allocated  by  vector ()  */ 

{ 

free((char*)  (v+nl)); 

) 
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void  free_dvectc  ,nl) 
double  *v; 
int  nl; 

/*  Frees  a  double  vector  allocated  by  dvectorO  */ 

{ 

free((char*)  (v+nl)); 

) 


void  free_matrix(ni,nrl,nrh,ncl) 

float  **m; 

int  nrl,nrh,ncl: 

/*  Frees  a  matrix  allocated  with  matrix()  */ 

{ 

int  i; 


) 


for  (i-nrh;  i>-nrl;  i--)  free((char*)  (m[i]+ncl)): 
free ((char*)  (m+nrl)); 


void  frce_dmatrix(m,nrl,nrh,ncl) 

double  **m; 

int  nrl,nrh,ncl; 

/*  Frees  a  matrix  allocated  with  dmatrix()  */ 

( 

int  i; 

for  (i-nrn;  i>-nrl;  i--)  free((ch3r*)  (m[i]+ncl)); 
free((char*)  (m+nrl)); 

) 


fcomplex  Complex ( re , im) 
double  re,lm; 

/*  Returns  a  complex  number  with  specified  real  and  imaginary  parts  */ 

{ 

fcomplex  c; 

c.r  -  re; 
c .  i  -  im ; 

return  c; 

) 
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fcomplex  Cadd(a,b) 
f complex  a.b; 

/*  Returns  the  complex  sum  of  two  complex  numbers  */ 

I 

fcomplex  c; 

c.r  -  a.r  +  b.r; 
c.l  -  a.l  +  b.t; 

return  c; 

) 


fcomplex  Csub(a,b) 
fcomplex  a,b; 

/*  Returns  the  complex  difference  of  two  complex  numbers  */ 

( 

fcomplex  c; 

c.r  -  a.r  -  b.r; 
c.l  -  a.l  -  b.l; 

return  c; 

) 


fcomplex  Cmul(a,b) 
fcomplex  a,b; 

/*  Returns  the  complex  product  of  two  complex  numbers  */ 

I 

fcomplex  c; 

c.r  -  a .  r*b .  r  -  a .  l*b .  1 ; 
c.l  -  a . i*b . r  +  a . r*b . 1 ; 

return  c; 

) 
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fcomplex  Cdiv(a,b) 
fcomplex  a,b; 

/*  Returns  the  complex  quotient  of  tv^o  complex  numbers  */ 

I 

fcomplex  c; 
double  r,den; 


if  (fabs(b.r)  >-  fabs(b.l)) 

{ 

r  -  b.l/b.r; 
den  -  b . r  +  r*b . i ; 
c.r  -  (a.r  +  r*a.i)/den; 
c.i  -  (a.l  -  r*a.r)/den; 

) 

else 

{ 

r  -  b .  r/b .  i ; 
den  -  b . i  +  r*b . r ; 
c.r  -  (a.r*r  +  a.i)/den; 
c.i  -  (a.i*r  -  a.r)/den; 

) 


) 


return  c; 


fcomplex  ConjgCz) 
fcomplex  z; 

/*  Returns  the  conjugate  of  a  complex  number  */ 

{ 

fcomplex  c; 

c.r  -  z.r; 
c.i  -  -1.0*z.i: 

return  c; 

) 


fcomplex  Cinv(z) 
fcomplex  z; 

/*  Returns  the  inverse  of  a  complex  number  */ 

{ 

fcomplex  c; 

c  -  Cdiv(Conjg(z) ,Cmul(z,Conjg(z))) ; 
return  c; 

) 
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double  Cabs(z) 
fcomplex  z; 

/*  Returns  the  absolute  value  of  a  complex  number  */ 

I 

double  X,  y,  ans,  temp; 

X  -  fabs(z.r) ; 
y  -  £abs(z.i) ; 
if  (x  —  0.0) 
ans  "  y; 

else 

if  (y  —  0.0) 
ans  -  x; 

else 

if  (X  >  y) 

I 

temp  -  y  /  x; 

ans  -  X  *  sqrt(1.0  +  temp  *  temp); 

) 

else 

{ 

temp  -  X  /  y; 

ans  -  y  *  sqrt(1.0  +  temp  *  temp); 

) 

return  ans; 

) 


fcomplex  Csqrt(z) 
fcomplex  z; 

/*  Returns  the  complex  square  root  of  a  complex  number  */ 

( 

fcomplex  c ; 
double  X,  y,  w,  r; 

if  ((z.r  —  0.0)  &&  (z.i  ~  0.0)) 
c.r  -  c.i  -  0.0; 

else 

( 

X  -  fabs(z.r) ; 
y  -  fabs(z.i) ; 
if  (x  >-  y) 

( 

r  -  y  /  x; 

w  -  sqrt(x)  *  sqrt(0.5  *  (1.0  +  sqrt(1.0  +  r  *  r))); 

) 

else 

{ 

r  -  X  /  y; 

w  -  sqrt(y)  *  sqrt(0.5  *  (r  +  sqrt(1.0  +  r  *  r))); 

) 
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if  (z.r  >-  0.0) 

{ 

c.r  -  w; 

c.i  -  z.i  /  (2.0  *  w): 

) 

else 

{ 

c.i  -  (z.i  >-  0)  ?  w  :  -w; 
c.r  -  z.i/(2.0  *  c.i); 

) 


return  c; 


f complex  RCmul(x,a) 
double  x; 
f complex  a; 

/*  Returns  the  complex  product  of  a  real  number  and  a  complex  number  */ 

{ 

f complex  c; 

c.r  -  X  *  a.r; 
c.i  -  X  *  a. i; 

return  c; 

) 


void  fourl(data_in,nn, isign) 
double  *data_in; 
int  nn, isign; 

/*  This  function  replaces  data_in  with  the  complex  FFT  */ 
/*  if  isign  is  1,  or  with  the  inverse  FFT  if  isign  is  -1.  */ 
/*  nn  is  the  number  of  time  samples  in  the  input  frame  */ 
{ 

int  n,  mmax,  m,  j,  istep,  i; 
double  wtemp,  wr,  wpr,  wpi,  wi,  theta; 
double  tempr,  tempi; 
n-nn  «  1; 

j-1: 

for  (i-l;  i<n;  i+-2) 

{ 

if  (J>i) 

{ 

SWAP(data_in[ j ] ,data_in[i] ) ; 

SWAP ( data_in ( j  +1 ] , data_in [ i+1 ] ) ; 

) 

m-n  »  1; 
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while  (m  >-  2  &&  j  >  m) 

{ 

j  —  m; 

m  »“  1 ; 

) 

j  +-  m; 

) 

niinax»2 ; 

while  (n>  nimax) 

{ 

istep-2*imnax ; 

theta-6 . 28318530717959/(isign*nmax) ; 

wtemp-sln(0 . 5*theta) ; 

wpr  -  • 2 . 0*wtemp*wtemp ; 

wpl-sin( theta) ; 

wr-1 . 0 : 

wi-0 . 0 ; 

for  (m-1;  m<ininax;  m+-2) 

for  (i-m;  i<-n;  i  +-istep) 

{ 

j-i+mmax; 

tempr-wr*data_in[ j ] -wi*data_in[ j+1) ; 
tempi-wr*data_in(j+l]+wi*data_tn[ j ] ; 
data_in(j )-data_in[i) 'tempr; 
data_in( j+1 ) -data_in[ i+1 ) • tempi ; 
data_in(i]  +-  tempr; 
data_in(i+l)  +-  tempi; 

1 

wr- ( wtemp-wr ) *wpr ■ wi*wp i+wr ; 
wi-wi*wpr+wtemp*wpi+wi ; 

) 

mmax-istep ; 

) 


void  mag ( rec t_data, length, mag_data, DC) 
double  *rect_data,  *mag_data; 

Int  length,  DC; 

/*  Rect_data  is  the  input  of  'length'  rectangular  complex  pairs  R  +  jl.  */ 
/*  The  magnitude  of  each  pair,  mag_data[j],  is  the  output  */ 

( 

int  i; 

double  temp; 

for  (i-1;  i<-length+l;  1++) 

{ 

temp  -  sqrt(pow(rect_data[2*i-ll ,2)  +  pow(rect_data[2*i] ,2)) ; 
if  (i— 1) 

DC  -  temp; 

else 

mag  data ( i - 1 ]  -  temp ; 

)  " 

) 


void  rect(rect_data, length, data_r,  data_i,DC) 
double  *rect_data,  *data_r,  *data_l,  *DC; 
int  length; 

/*  Rect_data  is  the  input  of  'length'  rectangular  complex  pairs  R  +  jl.  */ 
/*  The  arrays  data  r  and  data  i  are  the  respective  real  and  imag  components  */ 
I 

i  It  i ; 

double  temp; 

*DC  -  rect_data(l] ; 

for  (i-2;  i<-length+l;  !++) 

( 

data_r[i-l]  -  rect_data[2*i-l] ; 
data_i[i-l]  -  rect_data(2*i] ; 

) 

) 


/oid  eigsrt(d,v,n) 
double  *d,  **v; 
int  n; 

{ 

int  k, j , i; 
double  p; 

for  (i-l;i<n;i-(-+) 

{ 

p-d(k-i]; 

for  (j-i+l;j<-n;j++) 

if  (d(j]  >-  p)  p-d[k-j); 
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if  (k  !-  1) 

{ 

d[k]-d[il; 

d[i]-p; 

for 

{ 

p-v(j](i]: 

v(J][i]-v[j][k]: 

v[j][k]-p: 


void  svdcmp(a,m,n,w,v) 
double  **a,  *w,  **v; 
int  m,  n; 

/*  Given  a  matrix  a[l. .m,l. .n] ,  this  routine  computes  its  singular  value  */ 

/*  decomposition,  A  -  U.W.Vt.  The  Matrix  U  replaces  [a]  on  output.  The  */ 

/*  diagonal  matrix  of  singular  values  W  is  output  as  a  vector  w[l..n].  */ 

/*  The  matrix  V  (not  the  transpose  of  V,  Vt)  is  output  as  v[l. .n,l. .n] .  */ 

/*  m  must  be  greater  than  or  equal  to  n;  if  it  is  smaller,  then  A  should  */ 

/*  be  filled  up  to  square  with  zero  rows  */ 

{ 

int  flag,  i,  its,  j,  jj,  k,  1,  nm; 
double  c,  f,  h,  s,  x,  y,  z; 
double  anorm-0.0,  g-0.0,  scale-0.0; 
double  *rvl,  *dvector(); 
void  nrerrorO  ,  free_vector() ; 

if  (m  <  n)  nrerror("SVDCMP:  You  must  augment  with  extra  zero  rows"); 
rvl-dvector(l,n) ; 

/*  Householder  reduction  to  bidiagonal  form  */ 
printf ("Householder  reduction  started\n"); 
for  (i-1;  i<-n;  i-H-)  ' 

{  1 

1-i+i:  I 

rvl[i]-scale*g;  1 

g-s-scale-0 . 0 ;  ; 

if  (i  <-  m)  i 

t  'l 

for  (k-i;  k<-m;  k-H-)  stale  +-  fabs(a[k] [ i] ) ; 
if  (scale)  1 

for  (k-i;  k<-m;  kW) 

{ 

*(l'][f|  /“  scale; 
s  4-  a[k][i]*a(k][i]; 

) 

f-a[i][i]; 
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g  -  -SIGN(sqrt(s),f): 
h-f*g-s; 
a[i](iHf-g: 
if  (1  !-  n) 

{ 

for  (j-l;  j<-n;  j++) 

{ 

for  (s-0.0,k-i;  k<-m;  k++) 
s  +-  a[k][i]*a[k][j]: 

f-s/h; 

for  (k-i;  k<-m;  k++)  a[k][j]  +«  f*a[k][i] 

) 

) 

for  (k-i;  k<-m:  k-H-)  a[k][i]  *-  scale; 

) 

) 

w[i]-scale*g; 

g-s-scale-0 . 0 ; 

if  (i  <-  m  &&  i  !-  n) 

{ 

for  (k-1;  k<-n;  k-H-)  scale  fabs(a[i]  [k] ) ; 
if  (scale) 

I 

for  (k-1;  k<-n;  k-H-) 

( 

a(i] [k]  /-  scale; 
s  a[l]Ik]*a[i][k): 

) 

f-a[i](l]: 

g  -  -SIGN(sqrt(s) ,f) ; 

h-f*g-s; 

a[i][i]-f-g; 

for  (k-1;  k<-n;  k-H-)  rvl[k]-a[i] [k]/h; 
if  (i  !-  m) 

{ 

for  (j-l;  j<-m;  j-H-) 

{ 

i  for  (s-0.0,k-l;  k<-n;  k-H-) 

'  s  -H.  a[j][k)*a[i][k]: 


for  (k-1;  k<-n;  k-H-) 

a[j] [k]  +~  s*rvl[k] ; 

1  ) 

1  ) 

1  for  (k-1;  k<-n;  k-H-)  a[i][k]  *-  scale; 

) 

) 

ano’irm-MAX(anonn,  (fabs(w[i])-^fabs(rvl[i]))) ; 


/*  Accumulation  of  right-hand  transformation  */ 
for  (i-n;  i>-l;  i--) 

{ 

if  (i<n) 

{ 

if  (g) 

( 

for  (J-l;  j<-n;  j++) 

v(J][i]-(a[i][j]/a[i][l])/g; 
for  (j-l;  j<-n;  j-H-) 

{ 

for  (s-0.0,k-l:  k<-n;  k-H-) 
s  4-  a[i][k]*v[k][j]: 
for  (k-1;  k<-n:  k++) 

v[k][J]  +-  s*v[k][i]; 

) 

) 

for  (j-l;  j<-n;  j++)  v[i) [j ]-v[j ] [ij-O.O; 

) 

v[i](i]-i.O: 
g-rvl[i] : 

1-i: 

) 

/*  Accumulation  of  left-hand  transformations  */ 
for  <i-n;  i>-l;  i--) 

{ 

1-1+1; 
g-wli] : 
if  (i  <  n) 

for  (j-l;  j<-n;  j++)  a[i][j]-0.0: 

if  (g) 

{ 

g-l.O/g; 
if  (i  !-  n) 

{ 

for  (j-l;  j<-n;  j++) 

( 

for  (s-0.0,k-l;  k<-m;  k++) 
s  +-  a[k][i]*a[k][j]; 
f-(s/a[i][i])*g; 
for  (k-i;  k<-m;  k++) 

a[k][j]  +-  f*a[k][i]; 

) 

) 

for  (j-i;  j<-m;  j++) 
a(j][i]  *-  g; 

) 

else 

for  (j-i;  j<-m;  j++) 
a[j][l]  -  0.0; 

++a[i][i]; 

) 
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*/ 


/*  Diagonalization  of  the  bidiagonal  form  */ 
printf ("Diagonalization  startedXn") ; 

for  (k-n;  k>-l;  k--)  /*  loop  over  singular  values 

{ 

for  (its-1;  its<-30;  its-H-)  /*  loop  over  allowed  iterations  */ 

{ 

flag-1 ; 

for  (1-k;  1>-1;  1--)  /*  test  for  splitting  */ 

{ 

nm-1-1;  /*  note  that  rv[l]  is  always  zero  */ 

if  ((double)(fabs(rvl[l))+anorm)  —  anorm) 

( 

flag-0 ; 
break; 

) 

if  ( (double) (fabs (w[nm] )+anorm)  —  anorm)  break; 


1 

if  (flag) 

{ 


/*  cancellation  of  rvl[l],  if  1>1  */ 


c-0.0; 
s-1.0; 

for  (i-1;  i<-k;  i-H-) 

{ 

f-s*rvl(i]; 
rvl[i]-c*rvl[i] ; 
if  ((double) (fabs(f)+anorm) 
g-w(i] ; 

h-PYTHAG(f,g); 
wli]-h; 
h-l.O/h; 
c-g*h; 
s-(-f*h) ; 

for  (j-1;  j<-m;  j-H-) 

{ 

y-a[j][nm]; 

z-a[j][i]; 

a( j ] [nm]-y*c+z*s ; 
a(j][i]-z*c-y*s; 


anorm)  break; 


) 


) 

) 

z-w[k] ; 
if  (1  —  k) 
{ 


/*  convergence  */ 


if  (z  <  0.0)  /*  Singular  value  is  made  non-negative  */ 


{ 


) 


wflcl  w  *z* 

for  (j-l;’j<-n;  j-H-) 

v[J]{k]-(-v[j][k]): 


break; 


if  (its  —  100) 

nrerror("No  convergence  in  100  SVDCMP  iterations"); 
x-w[l]:  /*  Shift  from  bottom  2-by-2  minor  */ 

nm-k-1; 
y-w[nm] ; 
g-rvl[nm]; 
h-rvl(kl ; 

f-((y-z)*(y+z)+(g-h)*(g+h))/(2.0*h*y) ; 
g-PYTHAG(f,1.0); 

f-((x-z)*(x+z)+h*((y/(f+SIGN(g,f)))-h))/x: 


/*  Next  QR  information  */ 
c-s-1 . 0 ; 

for  (j-1;  j<-nm:  j-H-) 

( 

i-j+1; 

g-rvl[i] ; 
y-w(i]; 
h-s*g; 
g-c*g: 

z-PYTHAG(f,h); 
rvl[j]-z: 
c-f/z ; 
s-h/z ; 
f-x*c+g*s ; 
g-g*c-x*s; 
h-y*s; 
y-y*c; 

for  (JJ-1;  jj<-n:  jj++) 

( 

x-v[jjj[j): 
z-vljj](i) ; 
v[Jjllj]-x*c+z*s; 

■vijj  j  (i]~z*c-x*s; 

) 

z-PYTHAG(f ,h) ; 
w[j]-z; 

if  (z)  /*  Rotation  can  be  arbitrary  ii  z-0  */ 

( 

z-1 . 0/z : 
c-f*z ; 
s-h*z ; 

) 

f-(c*g)+(s*y); 
x-(c*y)-(s*g) ; 
for  (3 j-1;  jj<-m;  jj-H-) 

{ 

y-a[jjHjJ: 

z-a(jj](i]: 
a[jj] [j]“y*c+z*s; 
a(Jjjli]-z*c-y*s; 

) 


) 
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i-vl[l]-0.0 

rvl[k]-f: 

W[k]-x: 


) 

) 

free_dvector(rvl,l) ; 


LISTING  SEVEN 


/*  DATA  FILE  CONVERSION  PROGRAM 


Command  Line  INPUTS  :  data  to  lnfile.dat  infile. bed  outfile.snd 


This  program  will  input  a  specified  datafile  of  ASCII  integers  and  convert 
into  bytes  which  represent  the  sound.  An  associated  file  containing  the  sound 
file's  header  information  is  needed. 

OUTPUTS  are:  infile.snd 

Program  adapted  by  FLTLT  Don  Dryley  from  a  similar  program  written  by  CAPT 
Jim  Geurts. 

*/ 


#include  <stdio.h> 


typedef  struct 

{ 

int  magic;  /*  must  be  equal  to  SND_MAGIC  */ 

int  dataLocation;  /*  Offset  or  pointer  to  the  raw  data  */ 

int  dataSize;  /*  Number  of  bytes  of  data  in  the  raw  data  */ 

int  dataFormat;  /*  The  data  format  code  */ 

int  samplingRate;  /*  The  sampling  rate  */ 

int  channelCount;  /*  The  number  of  channels  */ 
char  lnfo(4];  /*  Textual  information  relating  to  the  sound.  */ 
)  SNDSoundStruct; 


main(argc,argv) 
int  argc; 
char  *argv[]; 

{ 

SNDSoundStruct 

FILE 

short  int 
int 


snd; 

*infile,  *outfile; 

*ind; 

n,number_of_samples ,  temp; 


/*  CHECK  COMMAND  LINE  ARGUMENTS  */ 
if  (argc  !-  4) 

printf("\n  Incorrect  Commard  Format  \n  Use  data_snd  infilename.dat 
infile. bed  outfilename.snci\n") ; 
exit( -1) ; 


/*  LOAD  SOUND  FILE  HEADER  FROM  HEADER  FILE  */ 

/*  open  header  file  */ 
infile  -  fopen(argv[2] , "r") ; 
if  (infile  —  NULL) 

printf( "Cannot  open  %s,  Exiting  to  system\n" ,argv[2] ) ; 
exit(-l); 

) 

/*  Read  in  the  header  information  */ 
fread(&snd,slzeof(SNDSoundStruct) ,1, infile) ; 

fclose(infile) ; 


/*  HOW  MANY  INTEGERS  ARE  THERE  IN  FILE  ?  */ 

/*  open  input  data  file  */ 
infile  -  fopen(argv(l] , "r") ; 
if  (infile  ~  NULL) 

printf ("Cannot  open  %s,  Exiting  to  systera\n" ,argv[l] ) ; 
cxit( -i) ; 

} 

/*  count  integers  */ 
number  of_samples  -  0; 

while  1  fscanf (infile, "%d",&temp)  !-  EOF)  numb e r_of_s amp les++: 


/*  READ  DATA  FROM  FILE  INTO  AN  ARRAY  */ 

/*  set  number  of  bytes  to  be  read  */ 
snd.dataSize  -  number_of_saraples  *  sizeof (short) ; 
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/*  create  array  */ 

Ind  -  (short  Int  *)malloc( (unsigned)  snd.dataSize) ; 
if  (!ind) 

*  printf( "allocation  failure,  Exiting  to  system\n"): 

exit(-l) : 

) 

/*  read  data  from  file  */ 
rewlnd(lnfile) ; 

for  (n-1;  n<“niiinber_of__samples;  n++) 

{ 

fscanf(lnflle, "%d",&temp) : 
lnd(nl  -  (short)  temp; 

1 

fclose(lnfile) ; 


/*  WRITE  DATA  IN  SOUND  FORMAT  TO  SOUND  FILE  */ 

/*  open  output  sound  file  */ 
outfile  -  fopen(argv(3) ,"w") ; 
if  (inf lie  —  NULL) 

^  prlntf( "Cannot  open  %s,  Exiting  to  system\n" ,argv[3]) 
exit( -1) : 

) 

/*  write  sound  header  */ 

fwrlte(&snd,  s ^zeof (SNDSoundStruct) , 1 , outfile) ; 

/*  write  data  to  sound  file  */ 

fwrite(lnd,  slzeof(char) .snd.dataSize, outfile) ; 


fclose(outfile) ; 


/* 


LISTING  EIGHT 

SOUND  FILE  CONVERSION  PROGRAM 


Command  Line  INPUTS  :  sound_to  inf ilename . snd  outfilename .hed 

outf ilename . dat 


This  program  will  input  a  specified  soundfile  and  output  the  ASCII  integers 
which  represent  the  sound.  In  addition,  an  associated  file  containing  the  sound 
file's  header  Information  is  created. 


OUTPUTS  are  :  speech.dat 

:  speech. hed 

Program  adapted  by  FLTLT  Don  Dryley  from  a  similar  program  written  by  CAPT 
Jim  Geurts . 

*/ 


\ 

\ 


\ 


\ 


\ 


#include  <stdio.h> 


typedef  struct 

{ 

int  magic;  /*  must  be  equal  to  SND_MAGIC  */ 

int  dataLocation;  /*  Offset  or  pointer  to  the  raw  data  */ 

int  dataSize;  /*  Number  of  bytes  of  data  in  the  raw  data  */ 

int  dataFormat;  /*  The  data  format  code  */ 

int  samplingRate ;  /*  The  sampling  rate  */ 

int  channelCount;  /*  The  number  of  channels  */ 
char  lnfo[4);  /*  Textual  information  relating  to  the  sound.  */ 
)  SNDSoundStruct; 


main(argc,argv) 
int  argc; 
char  *argv[ ] ; 

{ 

SNDSoundStruct 

FILE 

short  int 

int 

char 


snd; 

*sndfile,  *output_file; 
*dat; 

n,number_of_samples ; 
headerO  -  "snd_head.dat"; 


V 
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/ 

/ 

) 


/ 


/*  Double-checks  that  an  input  and  output  file  have  been  specified  */ 
if  (argc  !-  4) 

{ 

\  printf("\n  Incorrect  Command  Format  \n  Use  sound_to  infile.snd 

,  outfile.hed  outfile.dat\n") ; 

\  exit(-l): 

) 


\  ■  /■ 

'  / 


/*open  sound  file  */ 
sndfile  -  fopen(argv[l] , "r") ; 
if  (sndfile  —  NULL) 

( 

printf ("Cannot  open  %s,  Exiting  to  system\n" ,argv[l] ) ; 
exit(-l) : 

) 

/*  Read  in  the  header  information  */ 
fread(&snd, slzeof (SNDSoundStruct) , 1 , sndfile) ; 

/*  Open  file  for  header  information,  and  save  header  structure  */ 
output_file  -  fopen(argv[2] , "w") ; 

if  (output_file  —  NULL)  j 

{ 

printf ("Cannot  open  %s,  Exiting  to  system\n" ,argv[2] ) ; 
exit(-l):  i 

)  I 

fwrite(&snd,  sizeof (SNDSoundStruct) ,  1,  output_file) ;  | 

fclose(output_file) ; 

“  j 

/*  Read  in  the  digitized  data  from  sound  file  */  | 

number_of_samples-snd. dataSize/2; 

dat-(short  int  *)malloc(number_of_samples*sizeof (short  int)); 
fseek(sndflle , snd. dataLocation.O) ; 

fread(  dat,  sizeof(short  int),  number_of_samples ,  sndfile); 

! 

/*  Open  output  file  and  output  the  data  in  ASCII  form  */ 
output_file  -  fopen  (argv[3] , "w") ; 
if  (output_file  —  NULL) 

{ 

printf ("Cannot  open  %s.  Exiting  to  system\n" ,argv[3] ) ; 
exit(-l) ; 

) 

for  (n-0;  n<nuraber_of_saraples ;  n-H-) 

fprintf (output_file , "%d\n" ,dat [n] ) ; 
fclose(output_file) ; 


/'.r' 
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APPENDIX  D 


MEAN  SQUARE  ERROR  PROGRAM 


/* 


Command  Line  Inputs  :  reduce  speech.dat 


Speech  source  files  are  first  converted  from  the  NeXT  sound  format  into  a 
data  format  using  the  program  sound_to.  Sound_to  creates  two  files,  a  header 
file  which  contains  the  source  file's  header  information  and  a  data  file  of 
integers  between  -32768  and  32768. 

This  program  uses  the  original  sound  file's  header  with  the  file  size 
adjusted  to  the  reconstructed  file's  size  (usually  not  same  size  as  original). 

The  program  prompts  the  user  for  the  number  of  coefficients  used  for  the 
reduction  experiment  writes  this  value  to  an  error  file  along  with  the  MSE. 


Outputs  ;  temp_2 . dat 
:  error  2.dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  Sep  92. 

*/ 


# Include 
#include 
#lnclude 
#include 
#include 
# include 


<fcntl.h> 
<stdio.h> 
<math . h> 
<strlng.h> 
<stdlib .h> 
"recipes .h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


typedef  struct 

;  int  magic; 

1/  int  DataLocation; 

int  DataSize; 
int  DataFormat; 
int  SamplingRate ; 
•/l  int  ChannelCount; 

/  char  info [4]; 

)  SNDheader; 


4-. 


-  2 


"  - 

.  j-  ''y  \ 


D 


void  transform  (trans  R.trans  I , dim, in_R, in_I , size ,KL_R,KL_i; 
double  *trans_R , *trans_I . *in_R, *in_I , **KL_R , **KL_I ; 
int  dim, size; 

/*  function  transforms  two  input  vectors  of  dimension  size  */ 
/*  into  two  vectors  of  coefficients  of  dimension  dim  using  */ 
/*  the  transformation  matrices  KL_R  and  KL_I  */ 

( 

int  i,j: 

/*  transform  -  KL  multiplied  by  in_vector  */ 
for  (i-l;  i<-dim;  i-H-) 

{ 

trans_R[i]  -  trans_I(i]  -  0.0; 
for  (j-1;  j<-size;  J-H-) 

{ 

trans_R[i]  +-KL_R(i][j]  *  in_R[j]; 
trans_I(i]  +-KL_I[i]Ij]  *  in_I[j]; 

) 

) 

) 


void  inverse  transform  (inv  R. inv_I , size , red_R , red_I , dim, KL_R , KL_I ) 
double  *lnv_R,*inv_I ,*red_R,*red_I,**KL_R,**KL_I ; 
int  size, dim; 

/*  function  inverse  transforms  two  vectors  of  coefficients  of  */ 

/*  dimension  dim  into  two  output  vectors  of  dimension  size  */ 

/*  using  the  transformation  matrices  KL  R  and  KL  I  */ 

( 

int  i,j; 

/*  inverse  transform  -  KLI  multiplied  by  coefficients  */ 
for  (i-l;  i<-size;  1++) 

( 

inv_R[i]  -  inv_I(i]  -  0.0; 
for  (j-1;  j<-dlm;  j++) 

( 

lnv_R(i]  +-  KL_R[j](i)  *  red_R[j]; 
inv_I[i]  +-  KL_I[j](i]  *  red_I[j]; 


void  error_spectrum_(size , org_R,org_I , rec_R, rec_I , error_R, error_I) 
double  *org_R,  *org_I,  *rec_R,  *rec_l,  *error_R,  *error_I; 
int  size; 

/*  function  generates  a  normalised  error  spectrum  by,  for  each  */ 
/*  frequency,  dividing  the  difference  between  the  original  and  */ 
/*  reconstructed  spectrums  by  the  original  spectrum.  Assume  */ 
/*  assume  that  the  original,  reconstructed,  and  error  vectors  */ 


/*  are  created  prior  to  execution  with  the  function  vector  */ 

{ 

int  i; 

for  (i-l;  i<-size;  i-H-) 

{ 

error_R[i]  -  rec_R[i]  -  org_R[i]; 
error_I[i]  -  rec  I[ij  -  org_I[i]; 

) 

) 


void  main(argc , argv) 
int  argc; 
char  *argv[ ] ; 

( 


SNDheader 

FILE 

short 

int 

long 

float 

double 


char 


SND; 

* inhandl e ,  *outhandl e ; 

♦tempdata; 

1|  J.  k,  vectors,  integcrs_re  .d,  Lampint,  dim; 

♦buff,  templong; 
tempfloat; 

**KL_R,  **KL_I,  *averageR,  *averagel,  *d,  *org_R,  *org_I, 
*rec_R,  *reo_I,  *reduce_R,  *reduce_I,  DC,  magnitude,  *ferr_R, 
♦ferr  I,  *onetimeR,  *onetimeI; 

KLT  rJ]  -  "KLT_Re.dat", 

KLT“l(j  -  "KLT_Im.dat", 

AVG“r[]  -  "avgs_Re.dat", 

AVG_I[i  -  "avgs_Im.dat", 
tempi]  -  "temp_2.dat", 
error] ]  -  "error_2.dat"; 


/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

{  printf ("Format:  D>reduce  speech.dat  "); 

exit(-l) ; 

) 


/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

printf ("Enter  number  of  coefficients  <1  -  %d>  ...  ",n); 
scanf ("%d" ,&dim) ; 
printf("\n\n") ; 


/*  CREATE  MATRICES  */ 


buff  -  lvector(l,N) ; 
d  -  dvector(l,2*N) : 
averageR  -  dvector(l ,n) ; 
averagel  -  dvector(l,n) ; 
KL_R  -  dmatrlx(l ,n, 1 ,n) ; 
KL_I  -  dinatrix(l,n,l,n) ; 
org  R  -  dvector(l,n) ; 
org  I  -  dvector(l,n) ; 
rec_R  -  dvector(l,n) : 
rec_I  -  dvector(l,n) ; 
reduce_R  -  dvector(l,n) ; 
reduce_I  -  dvector(l ,n) ; 
onetimeR  -  dvector(l,n) ; 
onetlmel  -  dvector(l ,n) ; 
ferr_R  -  dvector(l,n) ; 
ferr_I  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  one  frame  of  speech  is  complex  */ 
/*  average  of  discrete  frequencies  */ 

/*  average  of  discrete  frequencies  */ 

/*  hold  KL  transform  */ 

/*  hold  KL  transform  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  reconstructed  frequency  vector  */ 
/*  holds  original  frequency  vector  */ 

/*  holds  set  of  coefficients  */ 

/♦holds  original  frequency  vector  */ 

/*  holds  error  for  a  single  vector  */ 

/*  holds  error  for  a  single  vector  */ 

/*  holds  accumulated  error  */ 

/*  holds  accumulated  error  */ 


/*  READ  TRANSFORM  FROM  FILE  */ 

/*  open  Real  transform  file  */ 
inhandle  -  fopon(KLT_R, "r") ; 
if (inhandle  —  NULL) 

{  prlntf ("Can' t  open  file  %s."  ,KLT_R); 

exit(-l): 

) 


/*  load  real  KL  from  transform  file  */ 
for  (i-1;  K-n;  1++) 

for  (j-1;  j<-n:  j++) 

( 

fscanf (inhandle , "%e" ,&tempfloat) ; 
KL_R[i](j]  -  (double)  tempfloat; 

) 

fclose( Inhandle) ; 

/*  open  Imaginary  transform  file  */ 
inhandle  -  fopen(KLT  I,"r"): 
if (inhandle  —  NULL)" 

{  printf ("Can' t  open  file  %s."  ,KLT_I); 
exit(-l) : 

) 

/*  load  imaginary  KL  from  transform  file  */ 
for  (i-l;  i<-n;  1++) 

for  (j-1;  j<-n;  j++) 

I 

fscanf (inhandle , "%e" ,&tempfloat) ; 
KL_I(i][J]  -  (double)  tempfloat; 

) 


fclose(lnhandle) ; 


/*  READ  FREQUENCY  AVERAGES  FROM  FILS  */ 

/*  open  real  averages  file  */ 

Inhandle  -  fopen(AVG_R, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_R); 
exit(-l): 

) 

/*  load  averages  from  averages  file  */ 
for  (j-1;  j<-n:  j-H-) 

{ 

fscanf (Inhandle , "%e" ,&tempfloat) ; 
averageR[ j ]  -  (double)  tempfloat; 

) 

fclose(lnhandle) ; 

/*  open  imaginary'  averages  file  */ 
inhandle  -  fopen(AVG  I,"r"); 
lf(inhandle  —  NULL)“ 

(  printf ("Can' t  open  file  %s."  ,AVG_I); 
exlt(-l); 

) 

/*  load  averages  from  averages  file  */ 
for  (J-1;  J<-n;  j++) 

{ 

fscanf (inhandle , "%e" ,&tempfloat) ; 
averagel(j]  -  (double)  tempfloat; 

) 

fclose ( inhandle) ; 


/*  KL  TRANSFORMATION  EXPERIMENT  */ 

/*  open  temporary  file  for  data  */ 
outhandle  -  fopen(temp, "w") ; 
if  (outhandle  —  NULL) 

{  printf ("Can' t  open  file  %s.",temp); 

exlt(-l); 

) 
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/*  open  source  file  for  reading  */ 
inhandle  -  fopen(argv[l] , "r") ; 
if  (inhandle  ~  NULL) 

{  printf ("Can* t  open  file  %s. " ,argv[l] ) ; 
exit(-l); 

) 

/*  count  nunber  of  vectors  in  source  file  */ 

integers_read  -  vectors  —  0; 

while  (fscanf (inhandle, "%d" ,&templnt)  !-  EOF) 

{ 

integers_read++ : 
if  (integers_read  --  N) 

{ 

integers_read  -  0; 
vectors-H-; 

) 

) 

rewind(inhandle) ; 

/*  transform  the  data  file  */ 
integers_read  -  0; 

J  -  1: 

while  <j  <-  vectors) 

{ 

f scanf( inhandle, " %d", &temp int) ; 

lntegers_read++ ; 

buff ( Integer s_read]  -  tempint; 

/*  transform  a  vector  */ 
if  (integers  read  —  N) 

( 

/*  load  input  array  to  fourier  transform  */ 
for  (k-1;  k<-N;  k++) 

{ 

d(2*k-l]  -  (double)  buff[k]; 
d[2*k-l]  /-  MAX_AMP; 

) 

/*  overwrite  input  with  complex  frequencies  */ 
fourl(d,N,l) ; 

/*  extract  real,  Imaginary  and  DC  components  */ 
rect(d,n,org_R,org_I,&DC/ ; 

/*  subtract  mean  from  each  component  */ 
for  (k-1;  k<-n;  k++) 

{ 

org  Rfk]  —  averageR[k]; 
org_I(k]  —  averagel[k]: 

) 


/*  generate  KL  coefficients  */ 

transform  (reduce  R , reduce_I , dim , or,^_R , org_I , n , KL_R , KL_I ) ; 

/*  reconstruct  frequency  vector  fro*  KL  coefficients  */ 
inverse  transform  (rec  R.rec  I.n, rt-  Juce_R, reduce_I , dim, 

KL_R,KL_I): 

/*  clear  input  array  for  inverse  FFT  */ 
for  (k-l;  k<-2*N;  k++) 
d(k]  -  0.0; 

/*  insert  DC  value  */ 
d[l]  -  DC; 

/*  load  input  array  with  reconstructed  spectrum  */ 
for  (k-2;  k<-n+l;  k++) 

{ 

d(2*k-l]  -  rec_R[k-l]  +  averageR[k-l] ; 
d(2*k]  -  rec_I(k-l]  +  averagel [k-l] ; 

1 

for  (k-2:  k<-n;  k++) 

{ 

d[N+2*k-l)  -  rec_R[n-k+l]  +  averageR[n-k+l] ; 
d(N+2*k]  -  -1.0  *  (rec  I[n-k+l]  +  averagel [n-k+l] ) ; 

) 

/*  reconstruct  time  domain  data  */ 
fourl(d,N, -1) ; 

/*  write  rcconstucted  time  domain  data  to  output  file  */ 
for  (k-l;  k<-N;  k-H-) 

{ 

d[2*k-l]  *-  MAX_AMP; 
d(2*k-l]  /-  N; 

fprintf (outhandle, "%d\n" , (int)  d[2*k-l] ) ; 

) 

/*  generate  error_spectrum  for  this  vector  */ 
error_spectrum_(n,org_R,org_£ ,rec_R, rec_I .onetimeR, onetimel) ; 

/*  accumulate  error  */ 
for  (k-l;  k<-n;  k-H-) 

{ 

ferr_R[k]  -h-  onetimeR[k]; 
ferr_I(kj  -t—  onetimeI[k]; 


/*  clear  input  array  for  next  vector  */ 
for  (k-l;  k<-2*N:  k-H-) 
d(k]  -  0.0; 

/*  increment  vector  counter  */ 

J++: 
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/*  reset  elements  per  vector  counter  */ 
lntegers_read-0 ; 

) 

} 

f close ( inhandle ) ; 
fclose(outhandle) ; 


/*  DETERMINE  ERROR  FOR  EXPERIMENT  */ 

/*  open  file  for  error  data  */ 
outhandle  -  fopen(error , "a") ; 
if  (outhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."); 

exit(-l) ; 

) 

/*  write  number  of  coefficients  used  this  experiment  */ 
f printf (outhandle, "%d  ".dim); 

/*  determine  average  error  par  frequency  for  this  file  */ 
for  (k-1;  k<-n;  k++) 

( 

ferr_R(k]  /-  vectors; 
ferr_I[k]  /-  vectors; 

) 

/*  find  absolute  value  of  real  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-1;  k<-n;  k-H-) 

magnitude  +-  ferr_R[k]  *  ferr_R[k] ; 

DC  -  sqrt(magnitude/n) ; 
f printf (outhandle ,  "%e  ",DC); 

/*  find  absolute  value  of  Imaginary  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-1;  k<-n;  k-H-) 

magnitude  -f-  ferr_l[kj  *  ferr_I[k]; 

DC  -  sqrt(maf uitude/n) ; 
fprintf (outhaudle ,  "%e  \n",DC); 

fclose(outhaudle) ; 
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/*  RELATIVE  MEAN  SQUARE  ERROR  PROGRAM 


Command  Line  Inputs  :  reduce  speech.dat 


Speech  source  files  are  first  converted  from  the  NeXT  sound  format  into  a 
data  format  using  the  program  sound_to.  Sound_to  creates  two  files,  a  header 
file  which  contains  the  source  file's  header  information  and  a  data  file  of 
integers  between  -32768  and  32768. 

This  program  uses  the  original  sound  file's  header  with  the  file  size 
adjusted  to  the  reconstructed  file's  size  (usually  not  same  size  as  original). 

The  program  prompts  the  user  for  the  number  of  coefficients  used  for  the 
reduction  experiment  writes  this  value  to  an  error  file  along  with  the  RMSE. 


Outputs  :  temp_2 . dat 
:  error  2 . dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  Sep  92 . 

*/ 


#include  <fcntl.h> 
#include  <stdio.h> 
^include  <math.h> 
#include  <string.h> 
#include  <stdlib.h> 
#include  "recipes .h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


typedef  struct 

{ 

int  magic; 
int  DataLocation; 
int  DataSize; 
int  DataFormat; 
int  SamplingRate ; 
int  ChannelCount; 
char  info[4]; 

)  SNDheader; 
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void  trans form  (trans  R , trans_I , dim , in_R , in_I .size, KL_R , KL_I ) 
double  *trans_R , *trans_I , *in_R , *in_I , **KL_R , **KL_I ; 
int  dim, size; 


/*  function  transforms  two  input  vectors  of  dimension  size  */ 
/*  into  two  vectors  of  coefficients  of  dimension  dim  using  */ 
/*  the  transformation  matrices  KL_R  and  KL_I  */ 

{ 

Int  i.J: 

/*  transform  -  KL  multiplied  by  in_vector  */ 
for  (i-1;  i<-dim;  i++) 

{ 

trans_R(i)  -  trans_I(i)  -  0.0; 
for  (J-1;  j<-size;  j++) 

( 

trans_R(i]  +-  KL_R[i)[j)  *  in_R[J); 
trans  I(i]  +-  KL_I(i)[j)  *  in_I[J); 

) 

) 


void  inverse_transform _ (inv_R, inv_I .size , red_R, red_I ,dim,KL_R,KL_I) 

double  *inv_R , *lnv_I , *red_R , *red_I , **KL_R , **KL_I ; 
int  size, dim; 

/*  function  inverse  transforms  two  vectors  of  coefficients  of  */ 

/*  dimension  dim  into  two  output  vectors  of  dimension  size  */ 

/*  using  the  transformation  matrices  KL_R  and  KL  I  */ 

( 

int  i, j ; 

/*  inverse  transform  -  KLI  multiplied  by  coefficients  */ 
for  (i-1;  K-size;  i++) 

{ 

inv  R  [  i  ]  -  lnv_I  [  i  ]  -  0.0; 
for~(j-l;  j<-dlm;  J-H-) 

( 

inv_R[i]  +-  KL_R(j]Ii]  *  red  R[j): 
lnv_I(il  +-  KL_I[j)(i)  *  red"l[J]; 


/ 
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void  error_spectrum_(size , org_R, org_I , rec_R , rec_I , error_R , error_I ) 
double  *org_R,  *org_I,  *rec_R,  *rec_l,  *error_R,  *error_I; 
int  size; 

/*  function  generates  a  normalised  error  spectrum  by,  for  each  */ 
/*  frequency,  dividing  the  difference  between  the  original  and  */ 
/*  reconstructed  spectrums  by  the  original  spectrum.  Assume  */ 
/*  assume  that  the  original,  reconstructed,  and  error  vectors  */ 
/*  are  created  prior  to  execution  with  the  function  vector  */ 

{ 

int  i; 

for  (i-l;  i<-size;  i++) 

{ 

if  (org_R(i]  I-  0.0) 

error_R[i]  -  (org_R[i]  -  rec_R[i])/org_R[i] ; 
else  if  (rec_R[i)  !-  0.0) 

error_R[i]  -  (org_R[i]  -  rec_R[ i) )/rec_R[ i] ; 
else  error_R[i]  -  0.0; 

if  (org_I[i)  !-  0.0) 

error_I[l]  -  (org_I(i)  -  rec_I [ i] )/org_I [ 1] ; 
else  if  (rec_I(i]  !-  0.0) 

error_I(i]  -  (org_I{i]  -  rec_I[l])/rec_I[i] ; 
else  error_I[i]  -  0.0; 

) 

) 


void  main(argc , argv) 
int  argc; 
char  *argv[]; 

{ 


SNDheader 

FILE 

short 

int 

long 

float 

double 


SND; 

♦inhandle ,  *outhandle ; 

♦tempdata; 

1,  j,  k,  vectors,  integers_read,  tempint,  dim; 

♦buff,  templong; 
tempfloat; 

♦♦KL_R,  ♦*KL_I,  ♦averageR,  ♦averagel,  ♦d,  ♦org_R,  ♦org_I, 
♦rec_R,  ♦rec_I,  ♦reduce_R,  ♦reduce_I,  DC,  magnitude,  ♦ferr_R, 
♦ferr_I,  ♦onetimeR,  ♦onetimel; 

KLT  R[]  -  "KLT_Re.dat", 

KLT~I[]  -  "KLT_Im.dat", 

AVG_R(j  -  "avgs_Re.dat", 

AVG_I ( ]  -  "avgs_Im . dat" , 
tempo  -  "temp_2.dat", 
error fl  -  "error  2.dat": 


char 


/  /  ■ 


“TT' 

/ 


y  •/  ■' 

-  / 


.  .\  ' 

^/  ■  ' 

/'■  i' 


/' 

/'  "’. 


/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

{  printf ("Format:  D>reduce  speech.dat  "); 

exit(-l) ; 

) 


/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

printf ("Enter  number  of  coefficients  <1  -  %d>  ...  ",n); 
scanf ( " %d" , &dim) ; 
printf ("\n\n") ; 


/*  CREATE  MATRICES  */ 

buff  -  lvector(l,N) : 
d  -  dvector(l , 2*N) ; 
averageR  -  dvector(l ,n) ; 
averagel  -  dvector(l ,n) ; 
KL_R  -  dmatrix(l,n,l,n) ; 
KL_I  -  dmatrix(l,n,l,n) : 
org_R  -  dvector(i,n) ; 
org_I  -  dvector(l,n) : 
rec_R  -  dvector(l,n) : 
rec_I  -  dvector(l,n) : 
reduce_R  -  dvector(l ,n) ; 
reduce_I  ~  dvector (1 ,n) ; 
onetimeR  -  dvector(l ,n) ; 
onetimel  -  dvector (1 ,n) ; 
ferr_R  -  dvector(l,n) ; 
ferr_I  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FIT  of  one  frame  of  speech  is  complex  */ 
/*  average  of  discrete  frequencies  */ 

/*  average  of  discrete  frequencies  */ 

/*  hold  KL  transform  */ 

/*  hold  KL  transform  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  reconstructed  frequency  vector  */ 
/*  holds  original  frequency  vector  */ 

/*  holds  set  of  coefficients  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  error  for  a  single  vector  */ 

/*  holds  error  for  a  single  vector  */ 

/*  holds  accumulated  error  */ 

/*  holds  accumulated  error  */ 


/*  READ  TRANSFORM  FROM  FILE  */ 

/*  open  Real  transform  file  */ 
inhandle  -  fopen(KLT_R, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,KLT_R): 

exit(-l) ; 

) 
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\ 


/*  load  real  KL  from  transform  file  */ 
for  (i-l;  K-n;  1++) 

for  (J-l;  j<-n;  j++) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
KL_R(i][j]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 

/*  open  Imaginary  transform  file  */ 
inhandle  -  fopen(KLT_I , "r") ; 
if (inhandle  —  NULL) 

{  printf("Can' t  open  file  %s."  ,KLT_I); 
exit(-l): 

) 

/*  load  imaginary  KL  from  transform  file  */ 
for  (i-l;  K-n;  i-H-) 

for  (j-l;  j<-n;  j++) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
KL_I[i][j]  -  (double)  tempfloat; 

) 

fclose(lnhandle) ; 


/*  READ  FREQUENCY  AVERAGES  FROM  FILE  */ 

/*  open  real  averages  fils  */ 
inhandle  -  fopen(AVG_R, "r") ; 
if(lnhandle  “NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_R) ; 

exit(-l) ; 

) 


/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n;  j-H-) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
averageR[j]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 

/*  open  imaginary  averages  file  */ 
inhandle  -  fopen(AVG_I , "r") ; 
if (inhandle  “  NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_I); 

exlt(-l) ; 

) 


E  -  6 


/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n:  J-H-) 

{ 

fscanf (Inhandle, "%e" ,&tempfloat) ; 
averagelfj]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  KL  TRANSFORMATION  EXPERIMENT  */ 

/*  open  temporary  file  for  data  */ 
outhandle  -  fopen(temp, "w") ; 
if  (outhandle  NULL) 

{  printf ("Can' t  open  file  %s.",temp); 
exit(-l) ; 

) 

/*  open  source  file  for  reading  */ 
inhandle  -  fopen(argv[l] , "r") ; 
if  (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s . " ,argv[l] ) ; 
exit(-l) : 

) 


/*  count  number  of  vectors  in  source  file  */ 
lntegers_read  -  vectors  -  0; 
while  (fscanf (inhandle , "%d" ,&tempint)  !-  EOF) 
{ 

integers_read-H- ; 
if  (integers_read  --  N) 

( 

integers_read  -  0; 
vectors-H-; 


) 

rewind( inhandle ) ;  I 

/*  transform  the  data  file  */ 
integers_read  -  0;  I 

J  -  1:  1 

while  (j  <-  vectors)  i 

(  \ 

fscanf (inhandle , "%d" ,&tempint) ; 

lntegers_read++;  | 

buff ( integers_read] tempint; 


E  -  7 


/*  transform  a  vector  */ 
if  (integers_read  —  N) 

{ 

/*  load  input  array  to  fourier  transform  */ 
for  (k-1;  k<-N;  k++) 

{ 

d[2*k-l}  -  (double)  buff [k] ; 
d[2*k-l]  /-  MAX_AMP; 

) 

/*  overwrite  input  with  complex  frequencies  */ 
fourl(d,N,l) ; 

/*  extract  real,  imaginary  and  DC  components  */ 
rect(d,n,org_R,org_I,&DC) ; 

/*  subtract  mean  from  each  component  */ 
for  (k-1;  k<-n;  k-H-) 

{ 

org_R[k]  --  averageRfk) ; 
org_I[k)  —  averagel[k]; 

) 

/*  generate  KL  coefficients  */ 

transform _ (reduce_R,reduce_I,dim,org_R,org_I,n,KL_R,KL_I) ; 

/*  reconstruct  frequency  vector  from  KL  coefficients  */ 
inverse  transform  (rec  R,rec  I, n, reduce  R,reduce_I ,dim, 

”kl_r,kl_i): 

/*  clear  input  array  for  inverse  FFT  */ 
for  (k-1;  k<-2*N;  k-H-) 
d(k)  -  0.0; 

/*  insert  DC  value  */ 
d[l]  -  DC; 

/*  load  input  array  with  reconstructed  spectrum  */ 
for  (k-2;  k<-n-^l;  k-H-) 

I 

d[2*k-l]  -  rec_R[k*l]  +  averageR[k-l] ; 
d[2*k]  -  rec_I{k-l)  +  averagel[k-l] ; 

) 

for  (k-2;  k<-n;  k-H-) 

{ 

d[N-f-2*k-l]  -  rec_R[n-k■^l]  +  averageR[n-k-Hl] ; 
d[N-^2*k]  -  -1.0  *  (rec_I  [n-k-fl]  +  averageI[n-k-^l] ) ; 

) 

/*  reconstruct  time  domain  data  */ 
fourl(d,N,-l); 
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/*  write  reconstucted  time  domain  data  to  output  file  */ 
for  (k-l;  k<~N;  k++) 

{ 

d[2*k-l]  *-  MAX_AMP; 
d[2*k-ll  /-  N;  • 

fprintf(outhandle, "%d\n" , (int)  d[2*k-l] ) ; 

) 

/*  generate  error_spectrum  for  this  vector  */ 
error_spectrum_(n, org_R, org_I , rec_R, rec_I , onetimeR, onetimel) ; 

/*  accumulate  error  */ 
for  (k-l;  k<-n;  k++) 

{ 

ferr_R[k]  4—  onetimeRfk] ; 
ferr_I[kj  +-  onetlmeI[k]; 

) 


/*  clear  input  array  for  next  vector  */ 
for  (k-l;  k<"2*N;  k++) 
d(k]  -  0.0; 

/*  increment  vector  counter  */ 

J++: 

/*  reset  elements  per  vector  counter  */ 
integers_read-0 ; 

) 

) 

fclose( inhandle) ; 
fclose(outhandle) ; 


/*  DETERMINE  ERROR  FOR  EXPERIMENT  */ 

/*  open  file  for  error  data  */ 
outhandle  -  fopen(error , "a") ; 
if  (outhandle  —  NULL) 

(  printf ("Can' t  open  file  %s."); 
exit(-l) ; 

) 

/*  write  number  of  coefficients  used  this  experiment  */ 
fprintf (outhandle , "%d  ",dim); 

/*  determine  average  error  per  frequency  for  this  file  */ 
for  (k-l;  k<-n;  k++) 

{ 

ferr_R[k]  /-  vectors; 
ferr_I[k]  /-  vectors; 

) 
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/*  find  absolute  value  of  real  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-1;  k<-n;  k++) 

magnitude  +-  ferr_R[k]  *  ferr_R[k]; 

DC  -  sqrt(magnitude/n) : 
fprintf (outhandle,  "%e  ",DC); 

/*  find  absolute  value  of  real  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-1;  k<-n;  k-H-) 

magnitude  +—  ferr_I[k]  *  ferr_I[k); 

DC  -  sqrt(magnitude/n) ; 
fprintf (outhandle,  "%e  \n",DC); 


fclose (outhandle) ; 


RMSE  PER  COEFFICIENT  PROGRAM 


/* 


Command  Line  Inputs  :  reduce  speech.dat 


Speech  source  files  are  first  converted  from  the  NeXT  sound  format  into  a 
data  format  using  the  program  sound_to.  Sound_to  creates  two  files,  a  header 
file  which  contains  the  source  file's  header  information  and  a  data  file  of 
integers  betwaen  -32768  and  32768. 

This  program  uses  the  original  sound  file's  header  with  t?ie  file  size 
adjusted  to  the  reconstructed  file's  size  (usually  not  same  size  as  original). 

The  program  prompts  the  user  for  the  number  of  coefficients  used  for  the 
reduction  experiment  writes  this  value  to  an  error  file  along  with  the  RMSE  per 
coefficient. 


Outputs  ;  temp_2 . dat 
:  error  2.dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  Sep  92. 

*/ 


#includa 

#include 

#include 

#inolude 

#include 

#include 


<fcntl.h> 

<stdio.h> 

<math.h> 

<string.h> 

<stdlib.h> 

"recipes .h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


typedef  struct 

( 

int  magic; 
int  DataLocation; 
int  DataSize; 
int  DataFormat; 
int  SamplingRate ; 
int  ChannelCount; 
char  info [4]; 

)  SNDheader; 


F  -  2 


void  transform  ( trans_R , tr ans_I , dim , in_R , in_I .size, KL_R , KL_I ) 
double  *trans_R , *trans_I . *in_R , *in_I . **KL_R , **KL_I ; 
int  dim, size; 

/*  function  transforms  two  input  vectors  of  dimension  size  */ 
/*  into  two  vectors  of  coefficients  of  dimension  dim  using  */ 
/*  the  transformation  matrices  KL_R  and  KL_I  */ 

{ 

int  i,j ; 

/*  transform  -  KL  multiplied  by  in_vector  */ 
for  (i-l;  i<-dim;  i-H-) 

{ 

trans_R[i]  -  trans_I[i]  -  0.0; 
for  (J-1;  j<-size;  j++) 

{ 

trans_R[i]  +-  KL_R[i][j]  *  in_R[j]; 
trans_I[i]  +-KL_IIf]Ij]  *  in_I[j]; 

) 

) 

) 


void  inverse_trans form  ( inv  R , inv_I , size , red_R , red_I , dim , KL_R , KL_I ) 
double  *lnv_R,*inv_I ,*red_R,*red_I ,**KL_R,**KL_I ; 
int  size, dim; 

/*  function  inverse  transforms  two  vectors  of  coefficients  of  */ 

/*  dimension  dim  into  two  output  vectors  of  dimension  size  */ 

/*  using  the  transformation  matrices  KL  R  and  KL_I  */ 

{ 

int  i, j ; 

/*  inverse  transform  -  KLI  multiplied  by  coefficients  */ 
for  (i-l;  i<-size;  1++) 

( 

inv_R(i]  -  inv_I[l]  -  0.0; 
for  (J-1;  j<-dim;  j-H-) 

{ 

lnv_R[i]  +-  KL_R[j](i)  *  red_R[ j ] ; 
lnv_I[i]  +-  KL_I[j](i]  *  red_I[j]; 

) 

) 

) 


void  error_spectrum_(size ,org_R,org_I ,rec_R, rec_I ,error_R, error_I) 
double  *org_R,  *org_I,  *rec_R,  *rec_I,  *error_R,  *error_I; 
int  size; 

/*  function  generates  a  normalised  error  spectrum  by,  for  each  */ 
/*  frequency,  dividing  the  difference  between  the  original  and  */ 
/*  reconstructed  spectnoms  by  the  original  spectrum.  Assume  */ 
/*  assume  that  the  original,  reconstructed,  and  error  vectors  */ 
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/*  are  created  prior  to  execution  with  the  function  vector 
{ 

int  i ; 

for  (i-1;  i<-size;  i-H-) 

{ 

if  (org_R[i]  !-  0.0) 

error_R[i]  -  (org_R(i]  -  rec_R[ i] )/org_R[ i] ; 
else  if  (rec_R[i]  !-  0.0) 

error_R[i]  -  <org_R[i)  -  rec_R[ i] )/rec_R[i] ; 
else  error_R.(i]  -  0.0; 

if  (or£_I(i]  f-  0.0) 

error_I[i]  -  (org_I(i]  -  rec_I(i])/org_I[i] ; 
else  if  (rec_I[i]  !-  0.0) 

error_I[i]  -  (org_Iti]  -  rec_I[i])/rec_I[i] ; 
else  error_I[i]  -  0.0; 


f  ■  J  ■■!  • 


void  main(argc , argv) 
int  argc; 
char  *argv( ] ; 

{ 


SNDheader 

SND; 

.•  i  r  ^  ■ 

■  ( 

FILE 

*inhandle,  *outhandle; 

/  ‘ 

short 

♦tempdata; 

int 

ii  j.  k,  vectors,  lntegers_read,  tempint,  dim; 

\  ■  ■ 

long 

♦buff,  templong; 

■I'l 

float 

tempfloat; 

double 

**KL  R,  **KL  I,  *averageR,  *averagel,  *d,  *org_R,  *org_I, 

'  1 

♦rec  R,  *rec  I,  *reduce_R,  *reduce_I,  DC,  magnitude,  *ferr_R 

-  ^  \ 

♦ferr  I,  *onetimeR,  *onetimeI: 

char 

KLT  R[]  -  "KLT  Re.dat", 

KLT  li]  -  "KLT  Im.dat", 

AVG_R[]  -  "avgs_Re.dat", 

r  .  ’■  /; 

AVG_I[]  -  "avgs_Im.dat", 

s  /  • 

temp ( j  -  " temp_2 . dat " , 

error []  -  "error_2.dat"; 


/*  CHECK  ARGUMENTS  */ 
if(  argc  !-  2) 

printf ("Format:  D>reduce  speech.dat  "); 
exit(-l); 


/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

prlntf ("Enter  number  of  coefficients  <1  -  %d>  ...  ",n); 

scanf ( " %d" , &dim) ; 

printf("\n\n"): 


/*  CREATE  MATRICES  */ 

buff  -  lvector(l,N) ;  /*  input  buffer  */ 

d  -  dvector(l,2*N) ;  /*  FFT  of  one  frame  of  speech  is  complex  */ 

averageR  -  dvector(l ,n) ;  /*  average  of  discrete  frequencies  */ 

averagel  -  dvector(l,n) ;  /*  average  of  discrete  frequencies  */ 

KL_R  -  dmatrix(l,n,l,n) :  /*  hold  KL  transform  */ 

KL_I  -  dmatrix(l,n,l,n) ;  /*  hold  KL  transform  */ 

org  R  -  dvector(l ,n) :  /*  holds  original  frequency  vector  */ 

org_I  -  dvector(l ,n) :  /*  holds  original  frequency  vector  */ 

rec_R  -  dvector(l ,n) ;  /*  holds  reconstructed  frequency  vector  */ 

rec_I  -  dvector(l ,n) ;  /*  holds  original  frequency  vector  */ 

reduce_R  -  dvector(l ,n) ;  /*  holds  set  of  coefficients  */ 

reduce_l  -  dvector(l ,n) ;  /*  holds  original  frequency  vector  */ 

onetimeR  -  dvector(l ,n) ;  /*  holds  error  for  a  single  vector  */ 

onetimel  -  dvector(l,n) ;  /*  holds  error  for  a  single  vector  */ 

ferr_R  -  dvector(l ,n) ;  /*  holds  accumulated  error  */ 

ferr_I  -  dvector(l,n) ;  /*  holds  accumulated  error  */ 


/*  READ  TRANSFORM  FROM  FILE  */ 

/*  open  Real  transform  file  */ 
inhandle  -  fopen(KLT_R, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,KLT_R); 

exit(-l) : 

) 

/*  load  real  KL  from  transform  file  */ 
for  (i-1;  l<-n:  i-H-) 

for  (J-l;  j<-n;  j++) 

{ 

f scanf (inhandle , "%e" ,&tempfloat) ; 
KLR[i][j]  -  (double)  tempfloat; 

) 

fclose(inhandle) ; 

/*  open  Imaginary  transform  file  */ 
inhandle  -  fopen(KLT_I , "r") ; 
if( inhandle  —  NULL) 

{  printf ("Can't  open  file  %s."  ,KLT_I): 

exit(-l) ; 

) 


/*  load  imaginary  KL  from  transform  file  */ 
for  (i-l;  i<-n;  i-H-) 

for  (j-1;  j<-n;  j++) 

{ 

fscanf (inhandle , "%e" ,&tempfloat) ; 
KL_I[l][j]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  READ  FREQUENCY  AVERAGES  FROM  FILE  */ 

/*  open  real  averages  file  */ 
inhandle  -  fopen(AVG_R, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_R); 

exit(-l); 

) 

/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n:  j++) 

{ 

fscanf (inhandle , "%e" ,&tempfloat) ; 
averageR[ j ]  -  (double)  tempfloat; 

) 

fc lose (inhandle) ; 

/*  open  imaginary  averages  file  */ 
inhandle  -  fopen(AVG_I , "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_I); 

exit(-l); 

) 

/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n;  J-H-) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
averagel[j]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  KL  TRANSFORMATION  EXPERIMENT  */ 

/*  open  temporary  file  for  data  */ 
outhandle  -  fopen(temp, "w") ; 
if  (outhandle  —•  NULL) 

{  printf ("Can't  open  file  %s.",temp); 

exit(-l) ; 

) 
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/*  open  source  file  for  reading  */ 

Inhandle  -  fopen(argv[l) , "r") ; 

If  (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s . " ,argv(l] ) ; 
exlt(-l>; 

) 

/*  count  number  of  vectors  in  source  file  */ 

Integer s_read  -  vectors  -  0; 

while  (fscanf (Inhandle , "%d" .Stemplnt)  I-  EOF) 

I 

lntegers_read-H- ; 
if  (integers_read  --  N) 

{ 

lntegers_read  -  0; 
vectors-H-; 

) 

) 

rewlnd( Inhandle ) ;  [ 

I 

/*  transform  the  data  file  */ 

Integers  read  -  0; 

J  -  1;  "  I 

while  (j  <-  vectors) 

( 

fscanf (inhandle , "%d" ,&templnt) : 

integers_read-H- ; 

buff [ integers_read]  -  tempint; 

/*  transform  a  vector  */ 

If  (lntegers_read  —  N)  | 

(  '  ■  I 

/*  load  input  array  to  fourier  transform  */ 
for  (k-1;  k<-N;  k-H-) 

{ 

d(2*k-l]  -  (double)  buff(k] ; 
d(2*k-ll  /-  MAX_AMP: 

) 

/*  overwrite  input  with  complex  frequencies  */ 
fourl(d,N,l); 

/*  extract  real,  imaginary  and  DC  components  */ 
rect(d,n,org_R,org_I,6tDC) ; 

/*  subtract  mean  from  each  component  */ 
for  (k-1;  k<-n:  k++) 

{ 

org_R(k)  •-  averageR[k); 
org  I[k)  —  averagel[kj; 

) 
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/*  generate  KL  coefficients  */ 

transform _ (reduce_R,reduce_I , dim, org_R,org_I ,n,KL_R,KL_I) ; 

/*  reconstruct  frequency  vector  from  KL  coefficients  */ 

inverse_transf orm _ ( rec_R , rec_I , n , reduce_R , reduce_I , dim , 

KL_R,KL_I) ; 

/*  clear  input  array  for  inverse  FFT  */ 
for  (k-1;  k<-2*N;  k-H-) 
d[k]  -  0.0: 

/*  Insert  DC  value  */ 
d[l)  -  DC; 

/*  load  input  array  with  reconstructed  spectrum  */ 
for  (k-2;  k<-n+l;  k-H-) 

{ 

d[2*k-l)  -  rec_R(k-l]  +  averageR[k-l] ; 
d(2*k]  1  rec_I[k-l)  +  averagel [k-1] ; 

1 

for  (k-2;  k<-n;  k-H-) 

{ 

d[N-l-2*k-;l]  -  rec_R[n-k-*-l]  -i-  averageR[n-k-hl] ; 
d[N-^2*k]  -  -1.0  *  (rec  I[n-k-fl]  +  averagel [n-k-^l]) ; 

)  I 

/*  reconstruct  time  domain  data  */ 
fourl(d,N, -1) ;! 

/*  write  reconlstucted  time  domain  data  to  output  file  */ 
for  (k-1;  k<-N;  k-H-) 

{  i 

d[2*k-l];  *-  MAX_AMP; 
d[2*k-l]|  /-  N; 

fprintf  (louthandle,  "%d\n" ,  (int)  d[2*k-l] ) ; 

)  i 

/*  generate  error_spectrum  for  this  vector  */ 
error_spectrura_(n,org_R,org_I , rec_R, rec_I , onetimeR, onetimel) ; 

/*  accumulate  error  */ 
for  (k-1;  k<-n;  k-H-) 

ferr_R[k]  onetimeRfk] ; 

ferr_I[k]  onetimeI[k]; 

) 

/*  clear  input  array  for  next  vector  */ 
for  (k-1;  k<-2*N;  k-H-) 
d(kl  -  0.0; 

/*  increment  vector  counter  */ 

J++: 
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/*  reset  elements  per  vector  counter  */ 
integers_read-0 ; 

) 

) 

fclose ( inhandle ) ; 
fclose(outhandle) ; 


/*  DETERMINE  ERROR  FOR  EXPERIMENT  */ 

/*  open  file  for  error  data  */ 
outhandle  -  fopen(error , "a") ; 
if  (outhandle  ■—  NULL) 

{  printf ("Can' t  open  file  %s."); 
exlt(-l); 

) 

/*  write  number  of  coefficients  used  this  experiment  */ 
fprintf (outhandle , "%d  ",dim); 

/*  determine  average  error  per  frequency  for  this  file  */ 
for  (k-1;  k<“n;  k-H-) 

( 

ferr_R[k]  /-  vectors; 
ferr  I[k]  /-  vectors; 

) 

/*  find  absolute  value  of  real  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-1;  k<-n:  k++) 

magnitude  +-  ferr_R[k]  *  ferr_R[k] ; 

DC  -  sqrt(magnitude/n)/dim; 
fprintf (outhandle,  "%e  ",DC); 

/*  find  absolute  value  of  imaginary  spectrum  and  write  to  file  */ 
magnitude  -  0.0; 
for  (k-l;  k<-n;  k++) 

magnitude  +-  ferr_I(k]  *  ferr_I[k]; 

DC  -  sqrt(magnitude/n)/dim; 
fprintf (outhandle ,  "%e  \n",DC); 

fclose(outhandle) ; 
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AFFENOIX  C 


USTING  ONE 


/*  CONDITION  NUMBER  PROGRAM 


Command  Line  Inputs  ;  Nil 


The  folloving  program  decomposes  the  two  covariance  matrices  into  the 
product  of  three  matrices  V.W.V^.  The  rows  of  the  KL  transform  matrices  are 
formed  from  the  columns  of  V  (eigenvectors)  and  the  singular  values  (eigenvalues 
squared)  are  contained  in  the  diagonal  matrix  U. 

The  singular  values  are  ranked  in  descending  order.  The  ratio  of  the 
largest  SV  to  the  nth  SV  is  trhe  condition  number.  Pairs  of  condition 
numbers  for  each  decomposed  covariance  matrix  are  averaged  and  written  to  a 
file. 


OUTPUTS  :  File  KLT_Re.dat 
;  File  KLT_Im.dat 
;  accum  Im.dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT,  Sep  1992 

*/ 


# Include 
#lnclude 
#lnclude 
#include 
#lnclude 
#include 


<fcntl .h> 
<stdlo .h> 
<math.h> 
<scring.h> 
<stdlib.h> 
"recipes .h" 


#define  MAX_AMP  32768 
#define  N  256 
#deflne  n  128 


void  maln(argc , argv) 
int  argc; 
char  *argv[]; 

{ 

PILE  *inhandle , *outlhandle , *out2handle ; 

int  i,  J,  k,  X,  vectors,  integers_read,  tempint; 

long  *buff; 
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float  tempfloat; 

double  *d,  *data_R,  *data_I,  *averageR,  *averagel,  **A_R,  **A_I,  *W, 

*S,  **V,  **Cov,  DC,  temp,  zero  -  0.0; 
char  covarlanceR[ )  -  "covar_Re.dat", 

covarlancel [ )  -  "covar_Im.dat" , 
transformRf]  -  "KLT_Re.dat", 
transformI(}  -  "KLT_Im.dat", 
eigenvalRf ]  -  "eig_Re . dat" , 
eigenvallii  -  “eig_Im.dat", 
accuml[]  -  "accvim_Im.dat"; 


/*  CREATE  ARRAYS,  INITIALISED  TO  ZERO  */ 


buff  -  lvector(l,N) ; 
d  -  dvector(l,2*N) ; 
data_R  -  dvector(l ,n) ; 
data_I  -  dvector(l ,n) ; 
averageR  -  dvector(l,n) ; 
averagel  -  dvector(l ,n) ; 
A_R  -  dmatrix(l ,n, 1 ,n) ; 
A_I  -  dmatrix(l,n,l,n) ; 

V  -  dmatrix(l,n,l,n) ; 

W  -  dvector(l ,n) : 

S  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  speech  data  is  complex  */ 

/*  Single  vector  of  real  coomponents  */ 

/*  Single  vector  of  imaginary  components  */ 

/*  average  of  real  components  */ 

/*  average  of  imaginary  components  */ 

/*  covariance  matrix  of  real  components  */ 

/*  covariance  matrix  of  imaginary  components  */ 
/*  Matrix  of  eigenvectors  */ 

/*  Matrix  of  singular  values  */ 


/*  LOAD  COVARIANCE  MATRICES  */ 

/*  open  Real  covariance  file  */ 
outlhandle  -  fopen(covarianceR, "r") I 
if  (outlhandle  “  NULL) 

( 

printf("Can't  open  file  %s.  Exiting  to  system\n'' ,covarianceR) ; 
exit(-l) ; 


/*  load  covariance  array  */ 
for  (i-l;  i<-n;  i++) 

for  (j-l;  j<-n;  J-h-) 

fscanf (outlhandle, "%e" ,&tempfloat) ; 
A  R(1][J]  “  (double)  tempfloat; 

) 

fclose (outlhandle) ; 
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/*  open  Imaginary  covariance  file  */ 
out2handle  -  fopen(covarianceI , "r") ; 
if  (out2handle  —  NULL) 

{ 

printf("Can't  open  file  %s.  Exiting  to  system\n" .covariancel) ; 
exit(-l) : 

) 

/*  load  covariance  array  */ 
for  <i-l;  i<-n:  i++) 

for  (j-1;  j<-n:  j++) 

{ 

fscanf (out2handle, "%e" ,&tempfloat) ; 

A_I(i][j]  -  (double)  tempfloat; 

1 

fclose(out2handle) ; 


/*  FORM  KL  TRANSFORM  OF  REAL  VECTORS  */ 

/*  find  and  sort  eigenvectors  of  A_R  and  A_I  */ 
svdcmp (A_R , n , n , W , V) ; 
eigsrt(W,A_R,n) ; 

/*  open  KL_R  transform  file  */ 
outlhandle  -  fopen  (transformR,"w") ; 
if  (outlliandle  —  NULL) 

{  printfC  'Can't  open  file  %s",transformR) ; 
exlt(-l ) : 

) 

/*  write  eigenvectors  to  KL_R  transform  file  */ 
for  (i-l;  i<-n:  1++) 

for  (j-l;  j<-n:  j++) 

fprintf (outlhandle, "%e\n'’ ,  (float)  A_R[j](i)): 
fclose (outlhandle) ; 

1 

/*  FORM  KL  TRANSFORM  OF  IMAGINARY  VECTORS  */ 

/*  decompose  matrix  and  rank  eigenvalues/vectors  */ 
svdcmp(A_I,n,n,S,V) ; 
eigsrt(S,A_I,n) ; 

/*  open  KL_I  transform  file  */ 
outlhandle  -  fopen  (transformi , "w") ; 
if(outlhandle  —  NULL) 

{  printf ("Can' t  open  file  %s" , transformi) ; 

exlt(-l) ; 

) 
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/*  write  imaginary  parts  of  eigenvectors  to  KL_I  transform  file  */ 
for  (i-l;  i<-n:  i++) 

for  (j-l;  j<-n:  j-H-) 

fprintf (outlhandle, "%e\n" , (float)  A_I[j][i]): 
fclose(outlhandle) ; 

/*  open  accumulation  file  */ 
out2handle  -  fopen  (accumi , "w") ; 
if (out2handle  —  NULL) 

{  printf ("Can* t  open  file  %s", accumi); 
exit(-l) ; 

) 

/*  write  condition  number  to  file  */ 
for  (i-1;  K-n;  i++) 

f pr intf (out2handle , “%e\n" .( float )  ( (W [ 1 ] /W [ i ] ) + ( S [ 1 ] /S [ i ] ) ) /2 ) ; 
fclose(out2handle) ; 
fclose (outlhandle) ; 


G  -  5 


LISTING  TWO 


/*  ENERGY  RATIO  PROGRAM 


Command  Line  Inputs  :  filename  of  source  speech.dat 


Speech  source  files  are  first  converted  from  the  NeXT  sound  format  into  a 
data  format  using  the  program  sound_to.  Sound_to  creates  two  files,  a  header 
file  which  contains  the  source  file's  header  information  and  a  data  file  of 
integers  which  between  -32768  and  32768. 

This  program  uses  the  original  sound  file's  header  with  the  file  size 
adjusted  to  the  reconstructed  file  size  (usually  not  same  size  as  original). 

The  program  prompts  the  user  for  the  number  of  coefficients  used  for  the 
reduction  experiment  and  reads  the  energy  represented  by  that  number  of 
coefficients. 


Outputs  :  temp_2 . dat 
;  error  2.dat 


Program  written  by  FLTLT  Don  Dryley  at  AFIT  Sep  92 

*/ 


#include  <fcntl.h> 
#include  <stdio.h> 
#include  <math.h> 
#include  <string.h> 
#include  <stdlib.h> 
#include  "recipes. h" 


#define  MAX_AMP  32768 
#define  N  256 
#define  n  128 


typedef  struct 

{ 

int  magic; 
int  DataLocation; 
int  DataSlze; 
int  DataFormat; 


Int  SampllngRate; 
int  ChannelCount; 
char  info [4]; 

}  SNDheader; 

void  transform  (trans  R.trans  I. dim. in_R, tn_I , size ,KL_R,KL_I) 
double  *trans_R , *trans_I , *in_R , *in_I , **KL_R , **KL_I ; 
int  dim, size; 

/*  function  transforms  two  input  vectors  of  dimension  size  «/ 
/*  into  two  vectors  of  coefficients  of  dimension  dim  using  */ 
/*  the  transformation  matrices  KL_R  and  KL_I  */ 

( 

int  i.j: 

/*  transform  -  KL  multiplied  by  in_vector  */ 
for  (i-l;  i<-dim;  i-i-+) 

{ 

trans_R[i]  -  trans_I[i]  -  0.0; 
for  (j-1;  j<-size;  j++) 

{ 

trans_R[i]  +-  KL_RtiJIj]  *  in_R[ j ] ; 
trsns_I(il  +-  KL_I(i](j)  *  in  I[j]; 

) 

} 

) 


void  inverse  transform _ (inv_R,inv  I,size,red_R,red  I.dim.KL  R,KL_I) 

double  *inv_R,*inv_I,*red_R,*red_l7**KL_R,**KL_I; 
int  size, dim; 

/*  function  inverse  transforms  two  vectors  of  coefficients  of  */ 

/*  dimension  dim  into  two  output  vectors  of  dimension  size  */ 

/*  using  the  transformation  matrices  KL_R  and  KL_I  */ 

{ 

int  i , j ; 

/*  inverse  transform  -  KLI  multiplied  by  coefficients  */ 
for  <i-l;  i<-size;  i-H-) 

{ 

inv_R[i]  -  inv_I[i]  -  0.0; 
for  <j-l;  j<-dim;  j-H-) 

{ 

inv  R[i]  +-  KL_R[j][i]  *  red_R[j]; 
inv"l[i]  +-  KL  I[j][i]  *  red_I[j]; 

) 

) 
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void  error_spectmin_(size , org_R, org_I , rec_R, rec_I , error_R, error_l) 
double  *org_R,  *org_I,  *rec_R,  *rec_I,  *error_R,  *error_I; 
int  size; 

/*  function  generates  a  normalised  error  spectrum  by,  for  each  */ 
/*  frequency,  dividing  the  difference  between  the  original  and  */ 
/*  reconstructed  spectrums  by  the  original  spectrum.  Assume  */ 
/*  assume  that  the  original,  reconstructed,  and  error  vectors  */ 
/*  are  created  prior  to  execution  with  the  function  vector  */ 

{ 

int  i : 

for  (i-l;  i<-size;  i++) 

{ 

if  (org_R[i]  !-  0.0) 

error_R(i]  -  (org_R(i]  -  rec_R[i] )/org_R[i] ; 
else  if  (rec_R[i]  !-  0.0) 

error_R[i]  -  (org_Rii)  -  rec_R[i) )/rec_R[i] ; 
else  error_R[i]  -  0.0; 

if  (org_I(i]  !-  0.0) 

error_I(i]  -  (org_I[i)  -  rec_I [ i] )/org_I [ i] ; 
else  if  (rec_I(i]  !-  0.0) 

error_I[i]  -  (org_l(i]  -  rec_I [ i] )/rec_I [ i] ; 
else  error  I[i]  -  0.0; 

) 

) 


void  main(argc,argv) 
int  argc; 
char  *argv[]; 

{ 


SNDheader 

SND; 

FILE 

* inhandl e ,  *outhandle ; 

short 

*tempdata; 

int 

i,  j,  k,  vectors,  integers_read,  tempint,  dim; 

long 

*buff,  templong; 

float 

tempfloat; 

double 

**KL_R,  **KL_I,  *averageR,  *averagel,  *d,  *org_R, 
*rec_R,  *rec_I,  *reduce_R,  *reduce_I,  DC,  mag_R, 
*ferr  R,  *ferr  I,  *onetimeR,  *onetimeI; 

*org_I , 
mag_I , 

char 

KLT  R[]  -  "KLT  Re. dat", 

KLT_I(]  -  "KLT_Im.dat", 

AVG_R( ]  -  "avgs_Re . dat" , 

AVG_I[]  -  "avgs_Im.dat”, 
temp ( ]  “  " temp_2 . dat " , 
error(]  -  "error_2  dat", 
eigenvalR[]  -  "eij,_Re.dat" , 
eigenvall[j  -  "eig_Im.dat"; 


/*  CHECK  ARGUMENTS  */ 


if(  argc  !“  2) 

(  prlntf ("Format:  D>reduce  speech.dat  "); 

exlt(-l) : 

} 


/*  PROMPT  FOR  NUMBER  OF  KL  COEFFICIENTS  */ 

printf ("Enter  number  of  coefficients  <1  -  %d>  . 
scanf ("%d" ,&dim) : 
printf ("\n\n") ; 

/*  open  Real  eigenvalues  file  */ 
inhandle  -  fopen(eigenvalR, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  .eigenvalR) ; 

exit(-l) : 

) 

/*  accumulate  first  dim  coefficients  */ 

mag_R  “0.0; 

for  (i-l;  i<-dim;  i-H-) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
magnitude  +-  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  open  Imaginary  eigenvalues  file  */ 
inhandle  -  fopen(elgenvalI , "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  .eigenvall); 

exit(-l) ; 

) 

/*  accumulate  first  forty  coefficients  */ 

mag_I  -  0.0; 

for  (i-l;  i<“dim;  i-H-) 

{ 

fscanf (inhandle, "%e" ,&tempfloat) ; 
mag_I  +—  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  CREATE  MATRICES  */ 


buff  -  lvector(l,N) ; 
d  -  dvector(l,2*N) : 
averageR  -  dvector(l,n) ; 
averagel  -  dvector(l ,n) ; 
KL_R  -  dmatrix(l,n,l,n) ; 
KL_I  -  dmar.rix(l,n,l,n) ; 
org_R  -  dvector(l,n) ; 
org_I  -  dvector(l ,n) ; 
rec_R  -  dvector(l ,n) ; 
rec_I  -  dvector(l ,n) : 
reduce_R  -  dvector(l .n) ; 
reduce_I  -  dvector(l ,n) ; 
onetimeR  -  dvector(l ,n) ; 
onetimel  -  dvector(l ,n) ; 
ferr_R  -  dvector(l ,n) ; 
ferr_I  -  dvector(l,n) ; 


/*  input  buffer  */ 

/*  FFT  of  speech  data  is  complex  */ 

/*  average  of  discrete  frequencies  */ 

/*  average  of  discrete  frequencies  */ 

/*  hold  KL  transform  */ 

/*  hold  KL  transform  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  reconstructed  frequency  vector  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  set  of  coefficients  */ 

/*  holds  original  frequency  vector  */ 

/*  holds  error  spectrum  for  a  single  vector  */ 
/*  holds  error  spectrum  for  a  single  vector  */ 
/*  holds  accumulated  error  over  all  vectors  */ 
/*  holds  accumulated  error  over  all  vectors  */ 


/*  READ  TRANSFORM  FROM  FILE  */ 

/*  open  Real  transform  file  */ 

Inhandle  -  fopen(KLT  R,"r"): 
if (inhandle  —  NULL)~ 

{  printf ("Can' t  open  file  %s."  ,KLT_R); 

exit(-l) ; 

) 

/*  load  KL  from  transform  file  */ 
for  (i-l;  K-n;  i-H-) 

for  (j-1;  j<-n:  j-H-) 

{ 

fscanf (inhandle , "%e" ,&tempfloat) ; 
KL_R[i][j]  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 

/*  open  Imaginary  transform  file  */ 
inhandle  -  fopen(KLT_I , "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,KLT_I); 
exlt(-l) : 

) 


/*  load  KL  from  transform  fila  */ 
for  (i-1;  K-n;  i++) 

for  (j'l;  j<-n;  j-H-) 

{ 

fscanf (Inhandle , "%e" ,&tempfloat) ; 
KL_l[i][jl  -  (double)  tempfloat; 

) 

fclose( inhandle) ; 


/*  READ  FREQUENCY  AVERAGES  FROM  FILE  */ 

/*  open  real  averages  file  */ 
inhandle  -  fopen(AVG_R, "r") ; 
if (inhandle  —  NULL) 

{  printf ("Can' t  open  file  %s."  ,AVG_R); 

exlt(-l) : 

) 

/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n;  j++) 

{ 

fscanf (Inhandle , "%e" ,&tempfloat) ; 
averageR[j)  -  (double)  tempfloat; 

) 

fclose ( inhandle ) ; 

/*  open  Imaginary  averages  file  */ 
inhandle  -  fopen(AVG  I,"r"); 
if (inhandle  —  NULL)” 

{  printf ("Can' t  open  file  %s."  ,AVG_I): 

exit(-l) ; 

) 

/*  load  averages  from  averages  file  */ 
for  (j-l;  j<-n;  j-H-) 

I 

fscanf (inhandle , "%e" ,&tempfloat) ; 
averagel[j)  -  (double)  tempfloat; 

) 

fclose (inhandle) ; 


/*  KL  TRANSFORMATION  EXPERIMENT  */ 

/*  open  temporary  file  for  data  */ 
outhandle  -  fopen(temp,"w") ; 
if  (outhandle  --  NULL) 

{  printf ("Can' t  open  file  %s.",temp); 

exit(-l) ; 

) 
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/*  open  source  file  for  reading  */ 
inhandle  -  fopen(argv(l] , "r") ; 
if  (inhandle  —  NULL) 

(  printf ("Can* t  open  file  %s . " , argv[l) ) ; 
exlt(-l) ; 

) 

/*  count  number  of  vectors  in  source  file  */ 

integers_read  -  vectors  -  0; 

while  (fscanf (Inhandle , "%d" ,&tempint)  I-  EOF) 

{ 

lntegers_read-i-+ ; 
if  (Integers  read  —  N) 

{ 

integers  read  -  0; 
vectors-H-; 

) 

) 

rewlnd( Inhandle) ; 

/*  transform  the  data  file  */ 
integers  read  -  0; 

j  -  1:  ~ 

while  (J  <-  vectors) 

( 

fscanf  (inhandle ,  "%d’*  ,&ten;pint) ; 

integers_read++: 

buff [ lnteg3rs_read)  -  tempint; 

/•♦  transform  a  vector  */ 
if  (lntegers_read  —  N) 

I 

/*  load  input  array  to  fourier  transform  */ 
for  (k-1;  k<-N;  k++) 

I 

d[2*k-l]  -  (double)  buff [k] ; 
d(2*k-l]  /-  MAX  AMP; 


/*  overwrite  input  with  complex  frequencies  */ 
fourl(d,N, 1) ; 

/*  extract  real,  imaginary  and  DC  components  */ 
rect(d,n,org_R,org_I,&DC) ; 

/*  subtract  mean  from  each  component  */ 
for  (k-1;  k<-n;  k-H-) 

{ 

org_R[k]  --  averageRfk] ; 
org_I(k]  --  averagel[kj; 


/*  generate  KL  coefficients  */ 

transform _ (reduce_R, reduce_I , dim, org_R, org_I , n,KL_R,KL_I) ; 

/*  reconstruct  frequency  vector  from  KL  coefficients  */ 

inverse_transform _ (rec_R, rec_I , n, reduce_R, reduce_I , dim, 

KL_R,KL_I) : 

/*  clear  input  array  for  inverse  FFT  */ 
for  (k-1;  k<-2*N;  k-H-) 
d(k]  -  0.0; 

/*  insert  DC  value  */ 
d(l]  -  DC; 

/*  load  input  array  with  reconstructed  spectrum  */ 
for  (k-2;  k<-n+l;  k++) 

{ 

d(2*k-l]  -  rec_R[k-l)  +  averageR[k-l] ; 
d[2*k]  -  rec_I(k-l)  +  averagel[k-l] ; 

) 

for  (k-2;  k<-n;  k++) 

{ 

d[N+2*k-l]  -  rec_R[n-k+l]  +  averageR[n-k+l] ; 
d[N+2*k)  -  -1.0  *  (rec  I[n-k+l]  +  averagel [n-k+1] ) ; 

) 

/*  reconstruct  time  domain  data  */ 
fourl(d,N, -1) ; 

/*  write  reconstucted  time  domain  data  to  output  file  */ 
for  (k-1;  k<-N;  k-H-) 

{ 

d(2*k-ll  *-  MAX_AMP; 
d[2*k-l]  /-  N; 

fprintf (outhandle, "%d\n" , (int)  d[2*k-l] ) ; 

) 

/*  clear  input  array  for  next  vector  */ 
for  (k-1;  k<-2*N;  k-^-^) 
d(k]  -  0.0; 

/*  increment  vector  counter  */ 

J++; 

/*  reset  elements  per  vector  counter  */ 
integers_read-0 ; 

) 

) 

fclose(inhandle) ; 
fclose (outhandle) ; 
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/*  DETERMINE  ERROR  FOR  EXPERIMENT  */ 

/*  open  file  for  error  data  */ 
outhandle  -  fopen(error , "a") ; 
if  (outhandle  —  NULL) 

{  printf( "Can't  open  file  %s."); 

exit(-l); 

) 

/*  write  number  of  coefficients  used  this  experiment  */ 
fprintf (outhandle , "%d  ",dim); 

/*  write  energy  ratios  for  real  amd  imaginary  coefficients  */ 
fprintf (outhandle,  "%e  %e",mag_R,  mag_l) ; 

fclose (outhandle) ; 
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TRAINING  SET 


There  comes  sly  shoe. 
That  blue  shack. 

Then  move  but  soon. 
She  shock  beans . 

Black  peep  slew. 

The  slack  moon. 
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TESTING  SET 
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